Volume 28, Issue 3
  • ISSN 1384-6655
  • E-ISSN: 1569-9811
Buy:$35.00 + Taxes



The article investigates the two main corpus indicators of word commonness, frequency and dispersion, through a cross-validation analysis of frequency and four dispersion measures (‘Range’, ‘Chi-squared’, ‘Deviation of Proportions’ and ‘Juilland’s D’). The approach provides an estimation of the capacity of the named measures to predict the distribution of corpus items in an extracted language sample. Based on a dataset of 273 Norwegian compounds, the results show that especially Deviation of Proportions is a robust measure of dispersion that can be used in conjunction with frequency to substantiate assertions of word commonness based on corpus data. In addition, dispersion measures do not only reflect what sort of distribution the frequency statistic is generated from, but also how reliable the frequency estimation in the corpus sample is in terms of giving an accurate representation of frequency in the language variety that the corpus is sampled from.


Article metrics loading...

Loading full text...

Full text loading...


  1. Baayen, R. H.
    (2008) Analyzing Linguistic Data: A Practical Introduction to Statistics Using R. Cambridge University Press. 10.1017/CBO9780511801686
    https://doi.org/10.1017/CBO9780511801686 [Google Scholar]
  2. Bakken, K.
    (1998) Leksikalisering av sammensetninger: en studie av leksikaliseringsprosessen belyst ved et gammelnorsk diplommateriale fra 1300-tallet [Lexicalisation of compounds: A study of the process of lexicalisation in light of a Norse diploma from 14th century]. [Doctoral dissertation, University of Oslo]. Acta Humaniora.
    [Google Scholar]
  3. Balota, D. A., & Spieler, D. H.
    (1998) The utility of item-level analyses in model evaluation: A reply to Seidenberg and Plaut. Psychological Science, 9(3), 238–240. 10.1111/1467‑9280.00047
    https://doi.org/10.1111/1467-9280.00047 [Google Scholar]
  4. Biber, D., Reppen, R., Schnur, E., & Ghanem, R.
    (2016) On the (non)utility of Juilland’s D to measure lexical dispersion in large corpora. International Journal of Corpus Linguistics, 21(4), 439–464. 10.1075/ijcl.21.4.01bib
    https://doi.org/10.1075/ijcl.21.4.01bib [Google Scholar]
  5. CLARINO UiB Portal
    CLARINO UiB Portal (2020) Norwegian Newspaper Corpus Bokmål. Created by Norsk aviskorpus. RetrievedFebruary 23, 2021, fromhttps://hdl.handle.net/11495/D9B5-0349-4330-0
    [Google Scholar]
  6. Durkin, P.
    (2016) Introduction. InP. Durkin (Ed.), The Oxford Handbook of Lexicography. Oxford University Press.
    [Google Scholar]
  7. Egbert, J., Burch, B., & Biber, D.
    (2020) Lexical dispersion and corpus design. International Journal of Corpus Linguistics, 25(1), 89–115. 10.1075/ijcl.18010.egb
    https://doi.org/10.1075/ijcl.18010.egb [Google Scholar]
  8. Fellbaum, C. D.
    (2015) The treatment of multi-word units in lexicography. InP. Durkin (Ed.), The Oxford Handbook of Lexicography (pp.411–424). Oxford University Press. 10.1093/oxfordhb/9780199691630.013.31
    https://doi.org/10.1093/oxfordhb/9780199691630.013.31 [Google Scholar]
  9. Fjeld, R. V., Nøklestad, A., & Hagen, K.
    (2020) Leksikografisk bokmålskorpus (LBK) – bakgrunn og bruk [Lexicographic corpus of Bokmål (LBK) – background and usage]. InJ. B. Johannessen & K. Hagen (Eds.), Leksikografi og korpus. En hyllest til Ruth Vatvedt Fjeld, Oslo Studies in Language11(1) (pp.47–59). 10.5617/osla.8176
    https://doi.org/10.5617/osla.8176 [Google Scholar]
  10. Glossa
    Glossa (2020) Leksikografisk bokmålskorpus (LBK). RetrievedFebruary 23, 2021, fromwww.hf.uio.no/iln/tjenester/kunnskap/samlinger/bokmal/veiledningkorpus/
  11. Gries, S. T.
    (2008) Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403–437. 10.1075/ijcl.13.4.02gri
    https://doi.org/10.1075/ijcl.13.4.02gri [Google Scholar]
  12. (2010) Dispersions and adjusted frequencies in corpora: Further explorations. InS. T. Gries, S. Wulff, & M. Davies (Eds.), Corpus Linguistic Applications: Current Studies, New Directions (pp.197–112). Rodopi. 10.1163/9789042028012_014
    https://doi.org/10.1163/9789042028012_014 [Google Scholar]
  13. (2020) Analyzing dispersion. InM. Paquot & S. T. Gries (Eds.), A Practical Handbook of Corpus Linguistics. Springer. 10.1007/978‑3‑030‑46216‑1_5
    https://doi.org/10.1007/978-3-030-46216-1_5 [Google Scholar]
  14. Lyne, A. A.
    (1985) The Vocabulary of French Business Correspondence: Word Frequencies, Collocations and Problems of Lexicometric Method. Slatkine-Champion.
    [Google Scholar]
  15. R Core Team
    R Core Team (2020) R: A language and environment for statistical computing (Version 4.1.1). R Foundation for Statistical Computing. https://www.R-project.org/
    [Google Scholar]
  16. Savický, P., & Hlavácová, J.
    (2002) Measures of word commonness. Journal of Quantitative Linguistics, 9(3), 215–231. 10.1076/jqul.
    https://doi.org/10.1076/jqul. [Google Scholar]
  17. Stefanowitsch, A.
    (2020) Corpus Linguistics: A Guide to the Methodology. Language Science Press.
    [Google Scholar]

Data & Media loading...

  • Article Type: Research Article
Keyword(s): cross-validation; dispersion; frequency; lexicography; word commonness
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error