1887

Frequency, Dispersion, Association, and Keyness

Revising and tupleizing corpus-linguistic measures

image of Frequency, Dispersion, Association, and Keyness

This book is an attempt to revisit the main specifically corpus-linguistic statistics/measures the field has been relying on for decades: frequency, dispersion, association, and keyness. The book first discusses the purpose of these measures and how they have been measured. Then, the book makes three main proposals: First, that many measures of dispersion, association, and keyness are too confounded with frequency and how to 'take frequency out of them' to obtain conceptually cleaner and more interpretable measures. Second, that many existing measures can be replaced by the simple information-theoretic measure of the Kullback-Leibler divergence and that it, too, can have frequency 'removed' from it. Third, that corpus linguistics should abandon the tradition of trying to describe its findings with a single number and adopt a tupleization approach instead, where we use several separate dimensions of information for description and interpretation. The book is written in an informal, hands-on style and comes with its own R package featuring functions, example data, and several thousand lines of code exemplifying all applications.

References

  1. Ackermann, Kirsten & Yu-Hua Chen
    2013 Developing the Academic Collocation List: A corpus-driven and expert-judged approach. Journal of English for Academic Purposes12(4). 235–247.
    [Google Scholar]
  2. Adelman, James S. , Gordon D. A. Brown , & José F. Quesada
    2006 Contextual Diversity, not word frequency, determines word-naming and lexical decision times. Psychological Science19(9). 814–823. 10.1111/j.1467‑9280.2006.01787.x
    https://doi.org/10.1111/j.1467-9280.2006.01787.x [Google Scholar]
  3. Adèr, Herman J.
    2008 Modelling. In Herman J. Adèr & Gideon J. Mellenbergh (eds.), Advising on research methods: A consultant’s companion, 271–304. Huizen: Johannes van Kessel Publishing.
    [Google Scholar]
  4. Ambridge, Ben , Anna L. Theakston , Elena V. M. Lieven , & Michael Tomasello
    2006 The distributed learning effect for children’s acquisition of an abstract syntactic construction. Cognitive Development21(2). 174–193. 10.1016/j.cogdev.2005.09.003
    https://doi.org/10.1016/j.cogdev.2005.09.003 [Google Scholar]
  5. Archer, Dawn
    (ed.) 2009What’s in a word-list? Investigating word frequency and keyword extraction. London: Routledge.
    [Google Scholar]
  6. Arppe, Antti
    2008 Univariate, bivariate and multivariate methods in corpus-based lexicography – A study of synonymy. Ph.D. dissertation, University of Alberta.
  7. Aslin, Richard N. & Elissa L. Newport
    2012 Statistical learning: From acquiring specific items to forming general rules. Current Directions in Psychological Science21(3). 170–176. 10.1177/0963721412436806
    https://doi.org/10.1177/0963721412436806 [Google Scholar]
  8. Baayen, R. Harald
    2010 Demythologizing the word frequency effect: A discriminative learning perspective. The Mental Lexicon5(3). 436–461. 10.1075/ml.5.3.10baa
    https://doi.org/10.1075/ml.5.3.10baa [Google Scholar]
  9. 2011 Corpus linguistics and naive discriminative learning. Brazilian Journal of Applied Linguistics11(2). 295–328. 10.1590/S1984‑63982011000200003
    https://doi.org/10.1590/S1984-63982011000200003 [Google Scholar]
  10. Baayen, R. Harald , Petar Milin , Dusica Filipović-Đurđević, D. , Peter Hendrix , & Marco Marelli
    2011 An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological Review118(3). 438–481. 10.1037/a0023851
    https://doi.org/10.1037/a0023851 [Google Scholar]
  11. Babych, Bogdan & Anthony Hartley
    2011 Meta-evaluation of comparability metric using parallel corpora. International Journal of Computational Linguistics and Applications2(1–2). 209–222.
    [Google Scholar]
  12. Baguley, Thom
    2012Serious stats: A guide to advanced statistics for the behavioral sciences. Houndmills: Palgrave Macmillan. 10.1007/978‑0‑230‑36355‑7
    https://doi.org/10.1007/978-0-230-36355-7 [Google Scholar]
  13. Baker, Paul
    2004 Querying keywords: Questions in difference, frequency, and sense in keyword analysis. Journal of English Linguistics32(4). 346–359. 10.1177/0075424204269894
    https://doi.org/10.1177/0075424204269894 [Google Scholar]
  14. Baker, Paul & Tony McEnery
    2005 A corpus-based approach to discourses of refugees and asylum seekers in UN and newspaper texts. Journal of Language and Politics4(2). 197–226. 10.1075/jlp.4.2.04bak
    https://doi.org/10.1075/jlp.4.2.04bak [Google Scholar]
  15. Baron, Alistair , Paul Rayson , & Dawn Archer
    2009 Word frequency and keyword statistics in historical corpus linguistics. Anglistik: International Journal of English Studies20(1). 41–67.
    [Google Scholar]
  16. Bavaud, François
    2009 Information theory, relative entropy and statistics. In G. Sommaruga (ed.), Formal theories of information: Lecture notes in computer science, 54–78. Berlin: Springer. 10.1007/978‑3‑642‑00659‑3_3
    https://doi.org/10.1007/978-3-642-00659-3_3 [Google Scholar]
  17. Belov, Dmitry I. & Ronald D. Armstrong
    2011 Distributions of the Kullback-Leibler divergence with applications. British Journal of Mathematical and Statistical Psychology64(2). 291–309. 10.1348/000711010X522227
    https://doi.org/10.1348/000711010X522227 [Google Scholar]
  18. Berger, Cynthia , Scott Crossley , & Stephen Skalicky
    2019 Using lexical features to investigate second language lexical decision performance. Studies in Second Language Acquisition41(5). 911–935. 10.1017/S0272263119000019
    https://doi.org/10.1017/S0272263119000019 [Google Scholar]
  19. Berry-Rogghe, Godelieve L. M.
    1974 Automatic identification of phrasal verbs. In John L. Mitchell (ed.), Computers in the humanities, 16–26. Edinburgh: Edinburgh University Press.
    [Google Scholar]
  20. Bestgen, Yves & Sylviane Granger
    2014 Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing26. 28–41. 10.1016/j.jslw.2014.09.004
    https://doi.org/10.1016/j.jslw.2014.09.004 [Google Scholar]
  21. Biber, Douglas , Randi Reppen , Erin Schnur , & Romy Ghanem
    2016 On the (non)utility of Juilland’s D to measure lexical dispersion in large corpora. International Journal of Corpus Linguistics21(4). 439–464. 10.1075/ijcl.21.4.01bib
    https://doi.org/10.1075/ijcl.21.4.01bib [Google Scholar]
  22. Bondi, Marina & Mike Scott
    (eds.) 2010Keyness in texts. Amsterdam: John Benjamins. 10.1075/scl.41
    https://doi.org/10.1075/scl.41 [Google Scholar]
  23. Bortz, Jürgen , Gustav A. Lienert , & Klaus Boehnke
    2008Verteilungsfreie Methoden in der Biostatistik. 3rd corr. ed.Heidelberg: Springer Medizin Verlag.
    [Google Scholar]
  24. Bouma, Gerlof
    2009 Normalized (Pointwise) Mutual Information in collocation extraction. Proceedings of the Biennial GSCL Conference30. 31–40.
    [Google Scholar]
  25. Bresnan, Joan , Anna Cueni , Tatiana Nikitina , & R. Harald Baayen
    2007 Predicting the dative alternation. In Gerlof Bouma , Irene Kraemer , & Joost Zwarts (eds.), Cognitive foundations of interpretation, 69–94. Amsterdam: Royal Netherlands Academy of Arts and Sciences.
    [Google Scholar]
  26. Brezina, Vaclav , & Miriam Meyerhoff
    2014 Significant or random? A critical review of sociolinguistic generalisations based on large corpora. International Journal of Corpus Linguistics19(1). 1–28. 10.1075/ijcl.19.1.01bre
    https://doi.org/10.1075/ijcl.19.1.01bre [Google Scholar]
  27. Brysbaert, Marc & Boris New
    2009 Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods41(4). 977–990. 10.3758/BRM.41.4.977
    https://doi.org/10.3758/BRM.41.4.977 [Google Scholar]
  28. Burch, Brent , Jesse Egbert , & Douglas Biber
    2017 Measuring and interpreting lexical dispersion in corpus linguistics. Journal of Research Design and Statistics in Linguistics and Communication Science3(2). 189–216. 10.1558/jrds.33066
    https://doi.org/10.1558/jrds.33066 [Google Scholar]
  29. Burnham, Kenneth P. & David R. Anderson
    2001 Kullback-Leibler information as a basis for strong inference in ecological studies. Wildlife Research28(2). 111–119. 10.1071/WR99107
    https://doi.org/10.1071/WR99107 [Google Scholar]
  30. Bybee, Joan & Sandra A. Thompson
    1997 Three frequency effects in syntax. Berkeley Linguistics Society23. 65–85. 10.3765/bls.v23i1.1293
    https://doi.org/10.3765/bls.v23i1.1293 [Google Scholar]
  31. Carroll, John B.
    1970 An alternative to Juilland’s usage coefficient for lexical frequencies and a proposal for a standard frequency index. Computer Studies in the Humanities and Verbal Behaviour3(2). 61–65.
    [Google Scholar]
  32. Charles, Walter G. , & George A. Miller
    1989 Contexts of antonymous adjectives. Applied Psycholinguistics10(3). 357–375. 10.1017/S0142716400008675
    https://doi.org/10.1017/S0142716400008675 [Google Scholar]
  33. Chen, Stanley F. & Joshua Goodman
    1999 An empirical study of smoothing techniques for language modeling. Computer Speech and Language13(4). 359–394. 10.1006/csla.1999.0128
    https://doi.org/10.1006/csla.1999.0128 [Google Scholar]
  34. Church, Kenneth W.
    2000 Empirical estimates of adaptation: The chance of two Noriegas is closer to p /2 than p 2 . InProceedings of the COLING 2000 (The 18th international conference on computational linguistics). np. 10.3115/990820.990847
    https://doi.org/10.3115/990820.990847 [Google Scholar]
  35. Church, Kenneth W. William Gale , Patrick Hanks , & Douglas Hindle
    1991 Using statistics in lexical analysis. In Uri Zernik (ed.), Lexical acquisition: Exploiting on-line resources to build a lexicon, 115–164. Hillsdale, NJ: Lawrence Erlbaum Associates.
    [Google Scholar]
  36. Church, Kenneth W. & Patrick Hanks
    1993 Word association norms, mutual information, and lexicography. Computational Linguistics16(1). 22–29.
    [Google Scholar]
  37. Collins, Peter
    2021 Cultural keywords in World Englishes: A GloWbE-based study. ICAME Journal45. 5–35. 10.2478/icame‑2021‑0001
    https://doi.org/10.2478/icame-2021-0001 [Google Scholar]
  38. Cover, Thomas H. & Joy A. Thomas
    2006Elements of information theory. 2nd ed.Hoboken, NJ: John Wiley.
    [Google Scholar]
  39. Culpeper, Jonathan
    2002 Computers, language and characterisation: An analysis of six characters in Romeo and Juliet. In Ulla Melander-Marttala , Carin Östman , & Merja Kyto (eds.), Conversation in life and in literature, 11–30. Uppsala: Association Suédoise de Linguistique Appliquée.
    [Google Scholar]
  40. 2009 Words, parts-of-speech and semantic categories in the character-talk of Shakespeare’s Romeo and Juliet. International Journal of Corpus Linguistics14(1). 29–59. 10.1075/ijcl.14.1.03cul
    https://doi.org/10.1075/ijcl.14.1.03cul [Google Scholar]
  41. Cvrček, Václav & Masako Fidler
    2019 More than keywords: Discourse prominence analysis of the Russian web portal Sputnik Czech Republic. In M. Berrocal & A. Salamurović (eds.), Political discourse in Central, Eastern and Balkan Europe, 93–117. Amsterdam John Benjamins. 10.1075/dapsac.84.05cvr
    https://doi.org/10.1075/dapsac.84.05cvr [Google Scholar]
  42. 2022 No keyword is an island: In search of covert associations. Corpora17(2). 259–290. 10.3366/cor.2022.0256
    https://doi.org/10.3366/cor.2022.0256 [Google Scholar]
  43. Damerau, Frederick J.
    1990 Evaluating computer-generated domain-oriented vocabularies. Information Processing and Management26(6). 791–801. 10.1016/0306‑4573(90)90052‑4
    https://doi.org/10.1016/0306-4573(90)90052-4 [Google Scholar]
  44. 1993 Generating and evaluating domain-oriented multi-word terms from texts. Information Processing and Management29(4). 433–447. 10.1016/0306‑4573(93)90039‑G
    https://doi.org/10.1016/0306-4573(93)90039-G [Google Scholar]
  45. Daudaravičius, Vidas & Ruta Marcinkevičienė
    2004 Gravity counts for the boundaries of collocations. International Journal of Corpus Linguistics9(2). 321–348. 10.1075/ijcl.9.2.08dau
    https://doi.org/10.1075/ijcl.9.2.08dau [Google Scholar]
  46. Degaetano-Ortlieb, Stefania & Elke Teich
    2016 Information-based modeling of diachronic linguistic change: From typicality to productivity. InProceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), 165–173. Berlin. 10.18653/v1/W16‑2121
    https://doi.org/10.18653/v1/W16-2121 [Google Scholar]
  47. 2022 Toward an optimal code for communication: The case of scientific English. Corpus Linguistics and Linguistic Theory18(1). 175–207. 10.1515/cllt‑2018‑0088
    https://doi.org/10.1515/cllt-2018-0088 [Google Scholar]
  48. Davies, Mark & Dee Gardner
    2010A frequency dictionary of contemporary American English: Word sketches, collocates and thematic lists. London: Routledge.
    [Google Scholar]
  49. Do, Youngah & Ryan Ka Yau Lai
    2019 Large-sample confidence intervals of information-theoretic measures in linguistics. Journal of Research Design and Statistics in Linguistics and Communication Science6(1). 19–54
    [Google Scholar]
  50. Dunning, Ted
    1993 Accurate methods for the statistics of surprise and coincidence. Computational Linguistics19(1). 61–74.
    [Google Scholar]
  51. Durlak, Joseph A.
    2009 How to select, calculate, and interpret effect sizes. Journal of Pediatric Psychology34(9). 917–928. 10.1093/jpepsy/jsp004
    https://doi.org/10.1093/jpepsy/jsp004 [Google Scholar]
  52. Durrant, Phil & Norbert Schmitt
    2009 To what extent do native and non-native writers make use of collocations?International Review of Applied Linguistics47. 157–177. 10.1515/iral.2009.007
    https://doi.org/10.1515/iral.2009.007 [Google Scholar]
  53. Edmundson, Harold P. & W. Wyllys
    1961 Automatic abstracting and indexing – Survey and recommendations. Communications of the ACM4. 226–234. 10.1145/366532.366545
    https://doi.org/10.1145/366532.366545 [Google Scholar]
  54. Egbert, Jesse & Douglas Biber
    2019 Incorporating text dispersion into keyword analyses. Corpora14(1). 77–104. 10.3366/cor.2019.0162
    https://doi.org/10.3366/cor.2019.0162 [Google Scholar]
  55. Ellis, Nick C.
    2006 Language acquisition as rational contingency learning. Applied Linguistics27(1). 1–24. 10.1093/applin/ami038
    https://doi.org/10.1093/applin/ami038 [Google Scholar]
  56. Ellis, Nick C. , Ute Römer , & Matthew Brook O’Donnell
    2016Usage-based approaches to language acquisition and processing. New York, NY: Wiley-Blackwell.
    [Google Scholar]
  57. Ellis, Nick C. & Rita Simpson-Vlach
    2005 An academic formulas list (AFL): Extraction, validation, prioritization. Paper presented atPhraseology2005, Université Catholique Louvain-la-Neuve.
    [Google Scholar]
  58. Ellis, Nick. C. , Rita Simpson-Vlach , & Carson Maynard
    2007 The processing of formulas in native and L2 speakers: psycholinguistic and corpus determinants. Paper presented at theSymposium on Formulaic Language, University of Wisconsin-Milwaukee.
    [Google Scholar]
  59. Eskridge, William N. , Brian G. Slocum , & Stefan Th. Gries
    2021 The meaning of sex: Dynamic words, novel applications, and original public meaning. Michigan Law Review119(7). 1503–1580.
    [Google Scholar]
  60. Evert, Stefan
    2009 Corpora and collocations. In Anke Lüdeling & Merja Kytö (eds.), Corpus linguistics: An international handbook, Vol.2, 1212–1248. Berlin: Mouton de Gruyter. 10.1515/9783110213881.2.1212
    https://doi.org/10.1515/9783110213881.2.1212 [Google Scholar]
  61. Evert, Stefan & Brigitte Krenn
    2001 Methods for the qualitative evaluation of lexical association measures. InProceedings of the 39th Annual Meeting of the Association for Computational Linguistics, 188–195. Toulouse. 10.3115/1073012.1073037
    https://doi.org/10.3115/1073012.1073037 [Google Scholar]
  62. Fankhauser, Peter , Jörg Knappen , & Elke Teich
    2014 Exploring and visualizing variation in language resources. InProceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14), 4125–4128.
    [Google Scholar]
  63. Fidler, Masako & Václav Cvrček
    2015 A data-driven analysis of reader viewpoints: reconstructing the historical reader using keyword analysis. Journal of Slavic Linguistics23(2). 197–239. 10.1353/jsl.2015.0018
    https://doi.org/10.1353/jsl.2015.0018 [Google Scholar]
  64. Firth, John R.
    1957Studies in linguistic analysis. Oxford: Basil Blackwell.
    [Google Scholar]
  65. Francis, W. Nelson & Henry Kučera
    1982Frequency analysis of English usage: Lexicon and grammar. Boston, MA: Houghton Mifflin.
    [Google Scholar]
  66. Forster, Kenneth I. & Susan M. Chambers
    1973 Lexical access and naming time. Journal of Verbal Learning and Verbal Behavior12(6). 627–635. 10.1016/S0022‑5371(73)80042‑8
    https://doi.org/10.1016/S0022-5371(73)80042-8 [Google Scholar]
  67. Fidelholtz, James L.
    1975 Word frequency and vowel reduction in English. Chicago Linguistic Society11. 200–213.
    [Google Scholar]
  68. Gabrielatos, Costas
    2018 Keyness analysis: Nature, metrics and techniques. In Charlotte Taylor & Anna Marchi (eds.), Corpus approaches to discourse: A critical review, 225–258. London: Routledge. 10.4324/9781315179346‑11
    https://doi.org/10.4324/9781315179346-11 [Google Scholar]
  69. Gale, William A. & Geoffrey Sampson
    1995 Good-Turing frequency estimation without tears. Journal of Quantitative Linguistics2(3). 217–237. 10.1080/09296179508590051
    https://doi.org/10.1080/09296179508590051 [Google Scholar]
  70. Gardner, Dee & Mark Davies
    2014 A new Academic Vocabulary List. Applied Linguistics35(3). 305–327.
    [Google Scholar]
  71. Garson, G. David
    1975Handbook of political science methods. 2nd ed.Boston, MA: Holbrook Press.
    [Google Scholar]
  72. Glenberg, Arthur M.
    1976 Monotonic and nonmonotonic lag effects in paired-associate and recognition memory paradigms. Journal of Verbal Learning and Verbal Behavior15(1). 1–15. 10.1016/S0022‑5371(76)90002‑5
    https://doi.org/10.1016/S0022-5371(76)90002-5 [Google Scholar]
  73. 1979 Component-levels theory of the effects of spacing of repetitions on recall and recognition. Memory and Cognition7(2). 95–112. 10.3758/BF03197590
    https://doi.org/10.3758/BF03197590 [Google Scholar]
  74. Goldberg, Adele E.
    1995Constructions: A Construction Grammar approach to argument structure. Chicago, IL: The University of Chicago Press.
    [Google Scholar]
  75. Goldberg, Adele E. , Devin M. Casenhiser , & Nitya Sethuraman
    2004 Learning argument structure generalizations. Cognitive Linguistics15(3). 289–316. 10.1515/cogl.2004.011
    https://doi.org/10.1515/cogl.2004.011 [Google Scholar]
  76. Gómez, Rebecca L.
    2002 Variability and detection of invariant structure. Psychological Science13(5). 431–436. 10.1111/1467‑9280.00476
    https://doi.org/10.1111/1467-9280.00476 [Google Scholar]
  77. Groom, Nicholas
    2009 Effects of second language immersion on second language collocational development. In Andy Barfield & Henrik Gyllstad (eds.), Researching collocations in another language, 21–33. Houndmills: Palgrave Macmillan. 10.1057/9780230245327_2
    https://doi.org/10.1057/9780230245327_2 [Google Scholar]
  78. Gries, Stefan Th.
    2003 Towards a corpus-based identification of prototypical instances of constructions. Annual Review of Cognitive Linguistics1. 1–27. 10.1075/arcl.1.02gri
    https://doi.org/10.1075/arcl.1.02gri [Google Scholar]
  79. Null-hypothesis significance testing of word frequencies: A follow-up on Kilgarriff. Corpus Linguistics and Linguistic Theory1(2). 277–294. 10.1515/cllt.2005.1.2.277
    https://doi.org/10.1515/cllt.2005.1.2.277 [Google Scholar]
  80. 2006 Exploring variability within and between corpora: Some methodological considerations. Corpora1(2). 109–151. 10.3366/cor.2006.1.2.109
    https://doi.org/10.3366/cor.2006.1.2.109 [Google Scholar]
  81. 2008 Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics13(4). 403–437. 10.1075/ijcl.13.4.02gri
    https://doi.org/10.1075/ijcl.13.4.02gri [Google Scholar]
  82. 2010 Dispersions and adjusted frequencies in corpora: Further explorations. In Stefan Th. Gries , Stefanie Wulff , & Mark Davies (eds.), Corpus linguistic applications: Current studies, new directions, 197–212. Amsterdam: Rodopi. 10.1163/9789042028012_014
    https://doi.org/10.1163/9789042028012_014 [Google Scholar]
  83. 2013 50-something years of work on collocations: What is or should be next …International Journal of Corpus Linguistics18(1). 137–165. 10.1075/ijcl.18.1.09gri
    https://doi.org/10.1075/ijcl.18.1.09gri [Google Scholar]
  84. 2016Quantitative corpus linguistics with R.2nd rev. & ext. ed.New York & London: Routledge, pp.274.
    [Google Scholar]
  85. 2018 The discriminatory power of lexical context for alternations: An information-theoretic exploration. Journal of Research Design and Statistics in Linguistics and Communication Science5(1–2). 78–106. 10.1558/jrds.38227
    https://doi.org/10.1558/jrds.38227 [Google Scholar]
  86. Gries, Stefan Th.
    2019a 15 years of collostructions: Some long overdue additions/corrections (to/of actually all sorts of corpus-linguistics measures). International Journal of Corpus Linguistics24(3). 385–412. 10.1075/ijcl.00011.gri
    https://doi.org/10.1075/ijcl.00011.gri [Google Scholar]
  87. Gries, Stefan Th.
    2019bTen lectures on corpus-linguistic approaches: Applications for usage-based and psycholinguistic research. Leiden: Brill. 10.1163/9789004410343
    https://doi.org/10.1163/9789004410343 [Google Scholar]
  88. 2020 Analyzing dispersion. In Magali Paquot & Stefan Th. Gries (eds.), A practical handbook of corpus linguistics, 99–118. Berlin: Springer. 10.1007/978‑3‑030‑46216‑1_5
    https://doi.org/10.1007/978-3-030-46216-1_5 [Google Scholar]
  89. 2021aStatistics for linguistics with R. 3rd rev. & ext. ed.Berlin: De Gruyter. 10.1515/9783110718256
    https://doi.org/10.1515/9783110718256 [Google Scholar]
  90. 2021b A new approach to (key) keywords analysis: Using frequency, and now also dispersion. Research in Corpus Linguistics9(2). 1–33. 10.32714/ricl.09.02.02
    https://doi.org/10.32714/ricl.09.02.02 [Google Scholar]
  91. 2022a What do (some of) our association measures measure (most)? Association?Journal of Second Language Studies5(1). 1–33. 10.1075/jsls.21028.gri
    https://doi.org/10.1075/jsls.21028.gri [Google Scholar]
  92. 2022b What do (most of) our dispersion measures measure (most)? Dispersion?Journal of Second Language Studies5(2). 171–205. 10.1075/jsls.21029.gri
    https://doi.org/10.1075/jsls.21029.gri [Google Scholar]
  93. 2022c Towards more careful corpus statistics: Uncertainty estimates for frequencies, dispersions, association measures, and more. Research Methods in Applied Linguistics1(1). 10.1016/j.rmal.2021.100002
    https://doi.org/10.1016/j.rmal.2021.100002 [Google Scholar]
  94. 2022d Multi-word units (and tokenization more generally): A multi-dimensional and largely information-theoretic approach. Lexis19. 10.4000/lexis.6231
    https://doi.org/10.4000/lexis.6231 [Google Scholar]
  95. 2024 Corrections to Nelson (2023): DP norm and D KLnorm are not wrong on pi at all. Journal of Quantitative Linguistics. 10.1080/09296174.2024.2324616
    https://doi.org/10.1080/09296174.2024.2324616 [Google Scholar]
  96. To appear. Cultural keywords in varieties research: Some suggestions to extend existing work. World Englishes.
    [Google Scholar]
  97. Gries, Stefan Th. , Beate Hampe , & Doris Schönefeld
    2005 Converging evidence: Bringing together experimental and corpus data on the association of verbs and constructions. Cognitive Linguistics16(4). 635–676. 10.1515/cogl.2005.16.4.635
    https://doi.org/10.1515/cogl.2005.16.4.635 [Google Scholar]
  98. Gries, Stefan Th. & Joybrato Mukherjee
    2010 Lexical gravity across varieties of English: An ICE-based study of n-grams in Asian Englishes. International Journal of Corpus Linguistics15(4). 520–548. 10.1075/ijcl.15.4.04gri
    https://doi.org/10.1075/ijcl.15.4.04gri [Google Scholar]
  99. Groom, Nicholas
    2009 Effects of second language immersion on second language collocational development. In Andy Barfield & Henrik Gyllstad (eds.), Researching collocations in another language, 21–33. Houndmills: Palgrave Macmillan. 10.1057/9780230245327_2
    https://doi.org/10.1057/9780230245327_2 [Google Scholar]
  100. Hackstein, Olav & Ryan Sandell
    2023 The rise of colligations: English can’t stand and German nicht ausstehen können . International Journal of Corpus Linguistics28(1). 60–90. 10.1075/ijcl.20022.hac
    https://doi.org/10.1075/ijcl.20022.hac [Google Scholar]
  101. Harris, Zellig S.
    1970Papers in structural and transformational linguistics. Dordrecht: Reidel. 10.1007/978‑94‑017‑6059‑1
    https://doi.org/10.1007/978-94-017-6059-1 [Google Scholar]
  102. Hilpert, Martin & Stefan Th. Gries
    2009 Assessing frequency changes in multi-stage diachronic corpora: Applications for historical corpus linguistics and the study of language acquisition. Literary and Linguistic Computing34(4). 385–401. 10.1093/llc/fqn012
    https://doi.org/10.1093/llc/fqn012 [Google Scholar]
  103. Hoffman, Elaine B. , Pranab K. Sen , Clarice R. Weinberg
    2001 Within-cluster resampling. Biometrika88(4). 1121–1134. 10.1093/biomet/88.4.1121
    https://doi.org/10.1093/biomet/88.4.1121 [Google Scholar]
  104. Howes, Davis H. & Richard L. Solomon
    1951 Visual duration threshold as a function of word probability. Journal of Experimental Psychology41(6). 401–410. 10.1037/h0056020
    https://doi.org/10.1037/h0056020 [Google Scholar]
  105. Hunston, Susan
    2002Corpora in applied linguistics. Cambridge: Cambridge University Press. 10.1017/CBO9781139524773
    https://doi.org/10.1017/CBO9781139524773 [Google Scholar]
  106. James, Gareth , Daniela Witten , Trevor Hastie , & Robert Tibshirani
    2021An introduction to statistical learning with applications in R. 2nd ed.Berlin: Springer. 10.1007/978‑1‑0716‑1418‑1
    https://doi.org/10.1007/978-1-0716-1418-1 [Google Scholar]
  107. Juilland, Alphonse G. , Dorothy R. Brodin , & Catherine Davidovitch
    1970Frequency dictionary of French words. The Hague: Mouton de Gruyter.
    [Google Scholar]
  108. Juilland, Alphonse & E. Chang-Rodriguez
    1964Frequency dictionary of Spanish words. The Hague: Mouton de Gruyter. 10.1515/9783112415467
    https://doi.org/10.1515/9783112415467 [Google Scholar]
  109. Justeson, John S. & Slava M. Katz
    1991 Co-occurrences of antonymous adjectives and their contexts. Computational Linguistics17(1). 1–20.
    [Google Scholar]
  110. Karlsson, Fred
    1985 Paradigms and word forms. Studia Gramatyczne7. 135–154.
    [Google Scholar]
  111. 1986 Frequency considerations in morphology. Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung39(1). 19–28. 10.1524/stuf.1986.39.14.19
    https://doi.org/10.1524/stuf.1986.39.14.19 [Google Scholar]
  112. Koplenig, Alexander
    2017 A data-driven method to identify (correlated) changes in chronological corpora. Journal of Quantitative Linguistics24(4). 289–318. 10.1080/09296174.2017.1311447
    https://doi.org/10.1080/09296174.2017.1311447 [Google Scholar]
  113. Kullback, Solomon & Richard A. Leibler
    1951 On information and sufficiency. Annals of Mathematical Statistics22(1). 79–86. 10.1214/aoms/1177729694
    https://doi.org/10.1214/aoms/1177729694 [Google Scholar]
  114. Kuperman, Victor , Hans Stadthagen-Gonzalez , & Marc Brysbaert
    2012 Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods44. 978–990. 10.3758/s13428‑012‑0210‑4
    https://doi.org/10.3758/s13428-012-0210-4 [Google Scholar]
  115. Kyle, Kristopher & Scott A. Crossley
    2015 Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly49(4). 757–786. 10.1002/tesq.194
    https://doi.org/10.1002/tesq.194 [Google Scholar]
  116. Lachman, Roy
    1973 Uncertainty effects on time to access the internal lexicon. Journal of Experimental Psychology99(2). 199–208. 10.1037/h0034633
    https://doi.org/10.1037/h0034633 [Google Scholar]
  117. Langacker, Ronald W.
    1987Foundations of Cognitive Grammar I: Theoretical prerequisites. Stanford, CA: Stanford University Press.
    [Google Scholar]
  118. Langenhorst, Jan , Yannick Frommherz , & Simon Meier-Vieracker
    2023 Keyness in song lyrics: Challenges of highly clumpy data. Journal for Language Technology and Computational Linguistics36(1). 21–38. 10.21248/jlcl.36.2023.236
    https://doi.org/10.21248/jlcl.36.2023.236 [Google Scholar]
  119. Leech, Geoffrey , Paul Rayson , & Andrew Wilson
    2001Word frequencies in written and spoken English: Based on the British National Corpus. London: Longman.
    [Google Scholar]
  120. Leech, Geoffrey & Roger Fallon
    1992 Computer corpora – What do they tell us about culture?ICAME Journal16. 29–50.
    [Google Scholar]
  121. Lester, Nicholas A.
    2017 The syntactic bits of nouns: How prior syntactic distributions affect comprehension, production, and acquisition. Ph.D. dissertation, University of California, Santa Barbara.
  122. Lester, Nicholas A. , Daniel Baum , & Tirza Biron
    2018 Phonetic duration of nouns depends on de-lexicalized syntactic distributions: Evidence from naturally occurring conversation. In Chuck Kalish , Martina Rau , Jerry Zhu , & Timothy Rogers (eds.), Proceedings of the 40th Annual Conference of the Cognitive Science Society, 2035–2040. Madison, WI.
    [Google Scholar]
  123. Lester, Nicholas A. , Laurie B. Feldman , & Fermín Moscoso del Prado Martín
    2017 You can take a noun out of syntax…: Syntactic similarity effects in lexical priming. In Glenn Gunzelmann , Andrew Howes , Thora Tenbrink , & Eddy Davelaar (eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society, 2537–2542. London, UK.
    [Google Scholar]
  124. Lester, Nicholas A. & Fermín Moscoso del Prado Martín
    2017 Syntactic flexibility in the noun: evidence from picture naming. In Anna Papafragou , Daniel Grodner , Daniel Mirman , & John C. Trueswell (eds.), Proceedings of the 38th Annual Conference of the Cognitive Science Society, 2585–2590. Philadelphia, PA.
    [Google Scholar]
  125. Liebetrau, Albert M.
    1983Measures of association. Beverly Hills, CA: Sage. 10.4135/9781412984942
    https://doi.org/10.4135/9781412984942 [Google Scholar]
  126. Lijffijt, Jefrey & Stefan Th. Gries
    2012 Correction to “Dispersions and adjusted frequencies in corpora”. International Journal of Corpus Linguistics17(1). 147–149. 10.1075/ijcl.17.1.08lij
    https://doi.org/10.1075/ijcl.17.1.08lij [Google Scholar]
  127. Lim, Zheng Wei , Harry Stuart , Simon De Deyne , Terry Regier , Ekaterina Vylomova , Trevor Cohn , & Charles Kemp
    2022 A computational approach to discovering cultural keywords across languages. PsyArXiv, last edited22 Nov 2022.
    [Google Scholar]
  128. Linzen, Tal & T. Florian Jaeger
    2015 Uncertainty and expectation in sentence processing: Evidence from subcategorization distributions. Cognitive Science40(6). 1382–1411. 10.1111/cogs.12274
    https://doi.org/10.1111/cogs.12274 [Google Scholar]
  129. Linzen, Tal , Alec Marantz , & Liina Pylkkänen
    2013 Syntactic context in visual world recognition: An MEG study. The Mental Lexicon8(2). 117–139. 10.1075/ml.8.2.01lin
    https://doi.org/10.1075/ml.8.2.01lin [Google Scholar]
  130. Mahlberg, Michaela
    2008 Clusters, key clusters and local textual functions in Dickens. Corpora2(1). 1–31. 10.3366/cor.2007.2.1.1
    https://doi.org/10.3366/cor.2007.2.1.1 [Google Scholar]
  131. McConnell, Kyla & Alice Blumenthal-Dramé
    2022 Effects of task and corpus-derived association scores on the online processing of collocations. Corpus Linguistics and Linguistic Theory18(1). 33–76. 10.1515/cllt‑2018‑0030
    https://doi.org/10.1515/cllt-2018-0030 [Google Scholar]
  132. McDonald, Scott A. & Richard C. Shillcock
    2001 Rethinking the word frequency effect: The neglected role of distributional information in lexical processing. Language and Speech44(3). 295–323. 10.1177/00238309010440030101
    https://doi.org/10.1177/00238309010440030101 [Google Scholar]
  133. McEnery, Anthony , Richard Xiao , & Yukio Tono
    2006Corpus-based language studies: An advanced resource book. London & New York: Routledge.
    [Google Scholar]
  134. Mehl, Seth
    2021 What we talk about when we talk about corpus frequency: The example of polysemous verbs with light and concrete senses. Corpus Linguistics and Linguistic Theory17(1). 223–247. 10.1515/cllt‑2017‑0039
    https://doi.org/10.1515/cllt-2017-0039 [Google Scholar]
  135. Michelbacher, Lukas , Stefan Evert , & Hinrich Schütze
    2011 Asymmetry in corpus-derived and human word associations. Corpus Linguistics and Linguistic Theory7(2). 245–276. 10.1515/cllt.2011.012
    https://doi.org/10.1515/cllt.2011.012 [Google Scholar]
  136. Mildenberger, Thoralf
    2023 Assessing keyness using permutation tests. arXiv: 2308.13383v1, last accessed25 Aug 2023.
    [Google Scholar]
  137. Milin, Petar , Dusica Filipović-Đurđević, D. , & Fermín Moscoso del Prado Martín
    2009 The simultaneous effects of inflectional paradigms and classes on lexical recognition: Evidence from Serbian. Journal of Memory and Language60(1). 50–64. 10.1016/j.jml.2008.08.007
    https://doi.org/10.1016/j.jml.2008.08.007 [Google Scholar]
  138. Milin, Petar , Victor Kuperman , Aleksandar Kostić , & R. Harald Baayen
    2009 Words and paradigms bit by bit: An information-theoretic approach to the processing of inflection and derivation. In James P. Blevins & Juliette Blevins (eds.), Analogy in grammar: Form and acquisition, 214–252. Oxford: Oxford University Press. 10.1093/acprof:oso/9780199547548.003.0010
    https://doi.org/10.1093/acprof:oso/9780199547548.003.0010 [Google Scholar]
  139. Millar, Neil & Brian S. Budgell
    2008 The language of public health – A corpus-based analysis. Journal of Public Health16(5). 369–374. 10.1007/s10389‑008‑0178‑9
    https://doi.org/10.1007/s10389-008-0178-9 [Google Scholar]
  140. Mollin, Sandra
    2009 Combining corpus linguistic and psychological data on word co-occurrences: Corpus collocates versus word associations. Corpus Linguistics and Linguistic Theory5(2). 175–200. 10.1515/CLLT.2009.008
    https://doi.org/10.1515/CLLT.2009.008 [Google Scholar]
  141. Monroe, Burt L. , Michael P. Colaresi , & Kevin M. Quinn
    2008 Fightin’ words: Lexical feature selection and evaluation for identifying the content of political conflict. Political Analysis16(4). 372–403. 10.1093/pan/mpn018
    https://doi.org/10.1093/pan/mpn018 [Google Scholar]
  142. Monsell, Stephen
    1991 The nature and locus of word frequency effects in reading. In Derek Besner & Glyn W. Humphreys (eds.), Basic processes in reading: Visual word recognition, 148–197. Hillsdale, NJ: Lawrence Erlbaum Associates.
    [Google Scholar]
  143. Moran, Matthew D.
    2003 Arguments for rejecting sequential Bonferroni in ecological studies. OIKOS100. 403–405. 10.1034/j.1600‑0706.2003.12010.x
    https://doi.org/10.1034/j.1600-0706.2003.12010.x [Google Scholar]
  144. Morrison, Catriona M. , Andrew W. Ellis , & Philip T. Quinlan
    1992 Age of acquisition, not word frequency, affects object naming, not object recognition. Memory and Cognition20. 705–714. 10.3758/BF03202720
    https://doi.org/10.3758/BF03202720 [Google Scholar]
  145. Mukherjee, Joybrato & Tobias Bernaisch
    2015 Cultural keywords in context: A pilot study of linguistic acculturation in South Asian Englishes. In Peter Collins (ed.), Grammatical change in English world-wide, 411–435. Amsterdam: John Benjamins. 10.1075/scl.67.17muk
    https://doi.org/10.1075/scl.67.17muk [Google Scholar]
  146. Nakagawa, Shinichi
    2004 A farewell to Bonferroni: The problems of low statistical power and publication bias. Behavioral Ecology15(6). 1044–1045. 10.1093/beheco/arh107
    https://doi.org/10.1093/beheco/arh107 [Google Scholar]
  147. Nelson, Robert
    2023 Too noisy at the bottom: Why Gries’ (2008, 2020) dispersion measures cannot identify unbiased distributions of words. Journal of Quantitative Linguistics30(2). 153–166. 10.1080/09296174.2023.2172711
    https://doi.org/10.1080/09296174.2023.2172711 [Google Scholar]
  148. Nenadić, Filip , Petar Milin , & Benjamin V. Tucker
    2021 Relative entropy effects on the processing of spoken Romanian verbs. The Mental Lexicon16(1). 23–48. 10.1075/ml.20010.nen
    https://doi.org/10.1075/ml.20010.nen [Google Scholar]
  149. Oakes, Michael & Malcolm Farrow
    2007 Use of the chi-squared test to examine vocabulary differences in English language corpora representing seven different countries. Literary and Linguistic Computing22(1). 85–99. 10.1093/llc/fql044
    https://doi.org/10.1093/llc/fql044 [Google Scholar]
  150. Oldfield, R. & A. Wingfield
    1965 Response latencies in naming objects. Quarterly Journal of Experimental PsychologyA(17). 273–281. 10.1080/17470216508416445
    https://doi.org/10.1080/17470216508416445 [Google Scholar]
  151. Onnis, Luca , Padraic Monaghan , Morten H. Christiansen , & Nick Chater
    2004 Variability is the spice of learning, and a crucial ingredient for detecting and generalizing in nonadjacent dependencies. InProceedings of the 26th Annual Meeting of the Cognitive Science Society, 1678–1683.
    [Google Scholar]
  152. Paquot, Magali
    2010Academic vocabulary in learner writing: From extraction to analysis. London & New-York, Continuum.
    [Google Scholar]
  153. 2013 Lexical bundles and transfer effects. International Journal of Corpus Linguistics18(3). 391–417. 10.1075/ijcl.18.3.06paq
    https://doi.org/10.1075/ijcl.18.3.06paq [Google Scholar]
  154. 2014 Cross-linguistic influence and formulaic language: Recurrent word sequences in French learner writing. In Leah Roberts , Ineke Vedder , & Jan H. Hulstijn (eds.), Eurosla Yearbook, Vol.14, 216–237. Amsterdam: John Benjamins.
    [Google Scholar]
  155. 2017 L1 frequency in foreign language acquisition: Recurrent word combinations in French and Spanish EFL learner writing. Second Language Research33(1). 13–32. 10.1177/0267658315620265
    https://doi.org/10.1177/0267658315620265 [Google Scholar]
  156. Paquot, Magali & Yves Bestgen
    2009 Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction. In Andreas Jucker , Daniel Schreier , & Marianne Hundt (eds.), Corpora: Pragmatics and discourse, 247–269. Amsterdam: Rodopi. 10.1163/9789042029101_014
    https://doi.org/10.1163/9789042029101_014 [Google Scholar]
  157. Paulsen, Mikkel Ekeland
    . To appear. Assessing word commonness: Adding dispersion to frequency. International Journal of Corpus Linguistics.
    [Google Scholar]
  158. Pecina, Pavel
    2010 Lexical association measures and collocation extraction. Language Resources and Evaluation44(1–2). 137–158. 10.1007/s10579‑009‑9101‑4
    https://doi.org/10.1007/s10579-009-9101-4 [Google Scholar]
  159. Pedersen, Ted
    1996 Fishing for exactness. InProceedings of the South-Central SAS Users Group Conference (SCSUG-96), 27-29.10.1996, Austin, TX.
    [Google Scholar]
  160. Pojanapunya, Punjaporn & Richard Watson Todd
    2018 Log-likelihood and odds ratio: Keyness statistics for different purposes of keyword analysis. Corpus Linguistics and Linguistic Theory14(1). 133–167. 10.1515/cllt‑2015‑0030
    https://doi.org/10.1515/cllt-2015-0030 [Google Scholar]
  161. Rayner, Keith & Susan A. Duffy
    1986 Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity. Memory and Cognition14(3). 191–201. 10.3758/BF03197692
    https://doi.org/10.3758/BF03197692 [Google Scholar]
  162. Rayson, Paul , Damon Berridge , & Brian J. Francis
    2004 Extending the Cochran rule for the comparison of word frequencies between corpora. In Gérald Purnelle , Cédrick Fairon , & Anne Dister (eds.), Le poids des mots: Proceedings of the 7th International Conference on Statistical analysis of textual data, Vol.II, 926–936. Louvain-la-Neuve: Presses Universitaires de Louvain.
    [Google Scholar]
  163. Rayson, Paul & Amanda Potts
    2020 Analysing keyword lists. In Magali Paquot & Stefan Th. Gries (eds.), Practical handbook of corpus linguistics, 119–139. Berlin: Springer. 10.1007/978‑3‑030‑46216‑1_6
    https://doi.org/10.1007/978-3-030-46216-1_6 [Google Scholar]
  164. Resnik, Philip
    1996 Selectional constraints: An information-theoretic model and its computational realization. Cognition61(1–2). 127–159. 10.1016/S0010‑0277(96)00722‑6
    https://doi.org/10.1016/S0010-0277(96)00722-6 [Google Scholar]
  165. Rogers, Phillip G. & Stefan Th. Gries
    2022 Grammatical gender disambiguates syntactically similar nouns. Entropy24(4), 520. 10.3390/e24040520
    https://doi.org/10.3390/e24040520 [Google Scholar]
  166. Römer, Ute & Stefanie Wulff
    2008 Applying corpus methods to written academic texts: Explorations of MICUSP. Journal of Writing Research2(2). 99–127. 10.17239/jowr‑2010.02.02.2
    https://doi.org/10.17239/jowr-2010.02.02.2 [Google Scholar]
  167. Rosengren, Inger
    1971 The quantitative concept of language and its relation to the structure of frequency dictionaries. Études de Linguistique Appliquée (Nouvelle Série)1. 103–127.
    [Google Scholar]
  168. Savický, Petr & Jaroslava Hlaváčová
    2002 Measures of word commonness. Journal of Quantitative Linguistics9(3). 215–231. 10.1076/jqul.9.3.215.14124
    https://doi.org/10.1076/jqul.9.3.215.14124 [Google Scholar]
  169. Schmid, Hans Joerg
    2010 Entrenchment, salience, and basic levels. In Dirk Geeraerts & Hubert Cuyckens (eds.), The Oxford handbook of cognitive linguistics, 117–138. Oxford: Oxford University Press.
    [Google Scholar]
  170. Schooler, Lael J. , & John R. Anderson
    1997 The role of process in the rational analysis of memory. Cognitive Psychology32(3). 219–250. 10.1006/cogp.1997.0652
    https://doi.org/10.1006/cogp.1997.0652 [Google Scholar]
  171. Schneider, Ulrike
    2020 ΔP as a measure of collocation strength: Considerations based on analyses of hesitation placement. Corpus Linguistics and Linguistic Theory16(2). 249–274.
    [Google Scholar]
  172. Schuchardt, Hugo
    1885Über die Lautgesetze: Gegen die Junggrammatiker. Berlin.
    [Google Scholar]
  173. Scott, Mike
    1997 PC analysis of key words – And key words. System25(2). 233–245. 10.1016/S0346‑251X(97)00011‑0
    https://doi.org/10.1016/S0346-251X(97)00011-0 [Google Scholar]
  174. Scott, Mike & Christopher Tribble
    2006Textual patterns: Key words and corpus analysis in language education. Amsterdam: John Benjamins. 10.1075/scl.22
    https://doi.org/10.1075/scl.22 [Google Scholar]
  175. Seidenberg, Mark S. & Mayellen C. MacDonald
    1999 A probabilistic constraints approach to language acquisition and processing. Cognitive Science23(4). 569–588. 10.1207/s15516709cog2304_8
    https://doi.org/10.1207/s15516709cog2304_8 [Google Scholar]
  176. Sheskin, David
    2011Handbook of parametric and non-parametric statistical procedures. 5th ed.Boca Raton, FL: Taylor & Francis.
    [Google Scholar]
  177. Shlens, Jonathon
    2014 Notes on Kullback-Leibler Divergence and Likelihood Theory. arXiv preprint, 1404.2000v1, 8 April 2014.
    [Google Scholar]
  178. Sinclair, John M.
    1996 The search for units of meaning. Textus9(1). 75–106.
    [Google Scholar]
  179. Siyanova-Chanturia, Anna
    2015 Collocation in beginner learner writing: A longitudinal study. System53. 148–160. 10.1016/j.system.2015.07.003
    https://doi.org/10.1016/j.system.2015.07.003 [Google Scholar]
  180. Sönning, Lukas
    2024 Evaluation of keyness metrics: Performance and reliability. Corpus Linguistics and Linguistic Theory20(2). 263–288. 10.1515/cllt‑2022‑0116
    https://doi.org/10.1515/cllt-2022-0116 [Google Scholar]
  181. Spärck Jones, Karen
    1972 A statistical interpretation of term specificity and its application in information retrieval. Journal of Documentation28(1). 11–21. 10.1108/eb026526
    https://doi.org/10.1108/eb026526 [Google Scholar]
  182. Stefanowitsch, Anatol & Stefan Th. Gries
    2003 Collostructions: Investigating the interaction between words and constructions. International Journal of Corpus Linguistics8(2). 209–243. 10.1075/ijcl.8.2.03ste
    https://doi.org/10.1075/ijcl.8.2.03ste [Google Scholar]
  183. Stubbs, Michael
    1995 Collocations and semantic profiles: On the cause of the trouble with quantitative methods. Functions of Language2(1). 23–55. 10.1075/fol.2.1.03stu
    https://doi.org/10.1075/fol.2.1.03stu [Google Scholar]
  184. 1996Text and corpus analysis: Computer-assisted studies of language and culture. Oxford: Blackwell.
    [Google Scholar]
  185. Suethanapornkul, Sakol & Sarut Supasiraprapa
    . To appear. Usage events and constructional knowledge: A study of two variants of the introductory-it construction. Studies in Second Language Acquisition.
    [Google Scholar]
  186. Sun, Hao & Jean-Pierre Koenig
    2017 There are more valence alternations than the ditransitive. In Julia Nee , Margaret Cychosz , Dmetri Hayes , Tyler Lau , & Emily Remirez (eds.), Proceedings of the 43rd Meeting of the Berkeley Linguistics Society, 291–308. Berkeley, CA: Berkeley Linguistics Society.
    [Google Scholar]
  187. Tomokiyo, Takashi & Matthew Hurst
    2003 A language model approach to keyphrase extraction. InProceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, 33–40. Stroudsbury, PA. 10.3115/1119282.1119287
    https://doi.org/10.3115/1119282.1119287 [Google Scholar]
  188. Tribble, Christopher
    2002 Small corpora and teaching writing: Towards a corpus-informed pedagogy of writing. In Mohsen Ghadessy , Alex Henry , & Robert L. Roseberry (eds.), Small corpus studies and ELT: Theory and practice, 381–408. Amsterdam: John Benjamins.
    [Google Scholar]
  189. Tucker, Benjamin V. , Daniel Brennerm , D. Kyle Danielson , Matthew C. Kelley , Filip Nenadić , & Michelle Sims
    2019 The Massive Auditory Lexical Decision (MALD) database. Behavior Research Methods51. 1187–1204. 10.3758/s13428‑018‑1056‑1
    https://doi.org/10.3758/s13428-018-1056-1 [Google Scholar]
  190. van Heuven, Walter J. B. , Pawel Mandera , Emmanuel Keuleers , & Marc Brysbaert
    2014 SUBTLEX-UK: A new and improved word frequency database for British English. The Quarterly Journal of Experimental Psychology67(6). 1176–1190. 10.1080/17470218.2013.850521
    https://doi.org/10.1080/17470218.2013.850521 [Google Scholar]
  191. VanPatten, Bill , Jessica Williams , Gregory D. Keating , & Stefanie Wulff
    2020 Introduction: The nature of theories. In Bill VanPatten , Gregory D. Keating , & Stefanie Wulff (eds.), Theories in second language acquisition: An introduction, 1–17. New York, NY: Routledge. 10.4324/9780429503986‑1
    https://doi.org/10.4324/9780429503986-1 [Google Scholar]
  192. Weisberg, Herbert F.
    1974 Models of statistical relationship. The American Political Science Review68(4). 1638–1655. 10.2307/1959947
    https://doi.org/10.2307/1959947 [Google Scholar]
  193. Wettler, Manfred , Reinhard Rapp , & Peter Sedlmeier
    2005 Free word associations correspond to contiguities between words in texts. Journal of Quantitative Linguistics12(2–3). 111–122. 10.1080/09296170500172403
    https://doi.org/10.1080/09296170500172403 [Google Scholar]
  194. Wilcox, Allen R.
    1973 Indices of qualitative variation and political measurement. The Western Political Quarterly26(2). 325–343. 10.1177/106591297302600209
    https://doi.org/10.1177/106591297302600209 [Google Scholar]
  195. Zhai, Chengxiang & John Lafferty
    2004 A study of smoothing methods for language models. ACM Transactions on Information Systems22(2). 179–214. 10.1145/984321.984322
    https://doi.org/10.1145/984321.984322 [Google Scholar]
  196. Zipf, George K.
    1935The psycho-biology of language. Boston, MA: Houghton Mifflin Harcourt.
    [Google Scholar]
/content/books/9789027246813
Loading
/content/books/9789027246813
dcterms_subject,pub_keyword
-contentType:Journal -contentType:Chapter
10
5
Chapter
content/books/9789027246813
Book
false
Loading
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error