Volume 26, Issue 3
  • ISSN 1384-6655
  • E-ISSN: 1569-9811
Buy:$35.00 + Taxes



Keyness is a commonly used method in corpus linguistics and is assumed to identify key items that are characteristic of 1 corpus when compared to another. This paper puts this assumption to the test by comparing case study corpora in the fields of genetic, immunological and psychiatric biomedical association studies, using what we refer to as a ‘K-FLUX’ analysis to produce a set of key items. Experts from within these fields are asked to evaluate the extent to which identified key items are characteristic of their discipline. The paper concludes that less than 50% of the items identified by the method are rated as highly characteristic by experts and that this ranges between types of association study. Further, there is difficulty in reaching a consensus over what is deemed to be ‘characteristic’, thus posing a challenge to the ultimate aim of the keyness method. The paper demonstrates the value of supporting corpus linguistic studies with expert assessments to evaluate whether (and which) items can be said to be indicative of a particular field.


Article metrics loading...

Loading full text...

Full text loading...


  1. Alderson, C.
    (2007) Judging the frequency of English words. Applied Linguistics, 28(3), 383–409. doi:  10.1093/applin/amm024
    https://doi.org/10.1093/applin/amm024 [Google Scholar]
  2. Anthony, L.
    (2018) AntConc (Version 3.5.7) [Computer software]. Waseda University. www.laurenceanthony.net/software/antconc
    [Google Scholar]
  3. Bauer, L., & Nation, P.
    (1993) Word families. International Journal of Lexicography, 6(4), 253–279. 10.1093/ijl/6.4.253
    https://doi.org/10.1093/ijl/6.4.253 [Google Scholar]
  4. Bondi, M.
    (2010) Perspectives on keywords and keyness: An introduction. InM. Bondi & M. Scott (Eds.), Keyness in Texts (pp.1–20). John Benjamins. 10.1075/scl.41.01bon
    https://doi.org/10.1075/scl.41.01bon [Google Scholar]
  5. Cheng, W.
    (2007) Concgramming: A corpus-driven approach to learning the phraseology of discipline-specific texts. CORELL: Computer Resources for Language Learning, 1, 22–35.
    [Google Scholar]
  6. (2009) Income/interest/net: Using internal criteria to determine the aboutness of a text. InK. Aijmer (Ed.), Corpora and Language Teaching (pp.157–177). John Benjamins. 10.1075/scl.33.15che
    https://doi.org/10.1075/scl.33.15che [Google Scholar]
  7. Conway, M.
    (2010) Mining a corpus of biographical texts using keywords. Literary and Linguistic Computing, 25(1), 23–35. 10.1093/llc/fqp035
    https://doi.org/10.1093/llc/fqp035 [Google Scholar]
  8. El-Haj, M., Rayson, P., Piao, S., & Knight, J.
    (2018) Profiling medical journal articles using a gene ontology semantic tagger. InN. Calzolari (Eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (pp.4593–4597). European Language Resources Association (ELRA). https://www.aclweb.org/anthology/volumes/L18-1/
    [Google Scholar]
  9. Fleiss, J. L.
    (1971) Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382. 10.1037/h0031619
    https://doi.org/10.1037/h0031619 [Google Scholar]
  10. Gabrielatos, C.
    (2018) Keyness analysis: Nature, metrics and techniques. InC. Taylor & A. Marchi (Eds.), Corpus Approaches to Discourse: A Critical Review (pp.225–258). Routledge. 10.4324/9781315179346‑11
    https://doi.org/10.4324/9781315179346-11 [Google Scholar]
  11. Gabrielatos, C., & Marchi, A.
    (2012, September13–14). Keyness: Appropriate metrics and practical issues [Paper presentation]. Corpus-Assisted Discourse Studies International Conference, Bologna, Italy. https://www.researchgate.net/publication/261708842_Keyness_Appropriate_metrics_and_practical_issues
    [Google Scholar]
  12. Hamilton, C., Adolphs, S., & Nerlich, B.
    (2007) The meanings of ‘risk’: A view from corpus linguistics. Discourse & Society, 18(2), 163–181. 10.1177/0957926507073374
    https://doi.org/10.1177/0957926507073374 [Google Scholar]
  13. Kass, R. E., & Raftery, A. E.
    (1995) Bayes factors. Journal of the American Statistical Association, 90(430), 773–795. 10.1080/01621459.1995.10476572
    https://doi.org/10.1080/01621459.1995.10476572 [Google Scholar]
  14. Kehoe, A., & Gee, M.
    (2011) Social Tagging: A new perspective on textual “aboutness”. Studies in Variation, Contacts and Change in English, 6(5). https://varieng.helsinki.fi/series/volumes/06/kehoe_gee/
    [Google Scholar]
  15. Landis, J. R. & Koch, G. G.
    (1977) The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. 10.2307/2529310
    https://doi.org/10.2307/2529310 [Google Scholar]
  16. NCBI
    NCBI (2018) PubMed. National Center for Biotechnology Information, U.S. National Library of Medicine. Bethesda MD, USA. https://www.ncbi.nlm.nih.gov/pubmed/
    [Google Scholar]
  17. Nenkova, A., & McKeown, K.
    (2012) A survey of text summarization techniques. InC. Aggarwal & C. Zhai (Eds.), Mining Text Data (pp.43–76). Springer. 10.1007/978‑1‑4614‑3223‑4_3
    https://doi.org/10.1007/978-1-4614-3223-4_3 [Google Scholar]
  18. Phillips, M.
    (1989) Lexical Structure of Text. Discourse Analysis Monographs: 12. English Language Research, University of Birmingham.
    [Google Scholar]
  19. Plappert, G.
    (2017) Candidate knowledge? Exploring epistemic claims in scientific writing: A corpus-driven approach. Corpora, 12(3), 425–457. 10.3366/cor.2017.0127
    https://doi.org/10.3366/cor.2017.0127 [Google Scholar]
  20. Pojanapunya, P., & Todd, R. W.
    (2018) Log-likelihood and odds ratio: Keyness statistics for different purposes of keyword analysis. Corpus Linguistics and Linguistic Theory, 14(1), 133–167. 10.1515/cllt‑2015‑0030
    https://doi.org/10.1515/cllt-2015-0030 [Google Scholar]
  21. Rayson, P.
    (2008) From keywords to key semantic domains. International Journal of Corpus Linguistics, 13(4), 519–549. 10.1075/ijcl.13.4.06ray
    https://doi.org/10.1075/ijcl.13.4.06ray [Google Scholar]
  22. (2016) Log-likelihood and effect size calculator [Excel spreadsheet]. ucrel.lancs.ac.uk/llwizard.html
    [Google Scholar]
  23. Saber, A.
    (2012) Phraseological patterns in a large corpus of biomedical articles. InA. Boulton, S. Carter-Thomas, & E. Rowley-Jolivet (Eds.), Corpus-informed Research and Learning in ESP: Issues and Applications (pp.45–82). John Benjamins. 10.1075/scl.52.03sab
    https://doi.org/10.1075/scl.52.03sab [Google Scholar]
  24. Scott, M.
    (1997) PC analysis of keywords – and key keywords. System, 25(2), 233–245. 10.1016/S0346‑251X(97)00011‑0
    https://doi.org/10.1016/S0346-251X(97)00011-0 [Google Scholar]
  25. (2001) Comparing corpora and identifying key words, collocations, and frequency distributions through the WordSmith Tools suite of computer programs. InM. Ghadessy, A. Henry, & R. L. Roseberry (Eds.), Small Corpus Studies and ELT (pp.47–67). John Benjamins. 10.1075/scl.5.07sco
    https://doi.org/10.1075/scl.5.07sco [Google Scholar]
  26. (2010) Problems in investigating keywords, or clearing the undergrowth and marking out trails…InBondi, M. & Scott, M. (Eds.), Keyness in Texts (pp.43–58). John Benjamins. 10.1075/scl.41.04sco
    https://doi.org/10.1075/scl.41.04sco [Google Scholar]
  27. (2015) WordSmith Tools Manual: Consistency analysis. https://lexically.net/downloads/version6/HTML/index.html?compare_versions.htm
    [Google Scholar]
  28. (2019) WordSmith Tools (Version 7) [Computer software]. Lexical Analysis Software. https://www.lexically.net/wordsmith/downloads/
    [Google Scholar]
  29. Scott, M., & Tribble, C.
    (2006) Textual Patterns: Key Words and Corpus Analysis in Language Education. John Benjamins. 10.1075/scl.22
    https://doi.org/10.1075/scl.22 [Google Scholar]
  30. Taylor, C.
    (2013) Searching for similarity using corpus-assisted discourse studies. Corpora, 8(1), 81–113. 10.3366/cor.2013.0035
    https://doi.org/10.3366/cor.2013.0035 [Google Scholar]
  31. (2018) Similarity. InC. Taylor, C. & A. Marchi (Eds.), Corpus Approaches to Discourse: A Critical Review (pp.19–37). Routledge. 10.4324/9781315179346‑2
    https://doi.org/10.4324/9781315179346-2 [Google Scholar]
  32. Williams, I. A.
    (2012) Self-reference in biomedical research article discussions: Further evidence for cross-cultural diversity in academic and scientific discourse. International Journal of Corpus Linguistics, 17(4), 546–583. 10.1075/ijcl.17.4.04wil
    https://doi.org/10.1075/ijcl.17.4.04wil [Google Scholar]
  33. Wilson, A.
    (2013) Embracing Bayes factors for key item analysis in corpus linguistics. InM. Bieswanger & A. Koll-Stobbe (Eds.), New Approaches to the Study of Linguistic Variability (pp.3–11). Peter Lang.
    [Google Scholar]

Data & Media loading...

  • Article Type: Research Article
Keyword(s): biomedical; characteristic; evaluation; key items; keyness
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error