Volume 3, Issue 1
  • ISSN 2542-9477
  • E-ISSN: 2542-9485
Buy:$35.00 + Taxes



The growing popularity of keyword analysis as an applied linguistics methodology has not been matched by an increase in the rigour with which the method is applied. While several studies have investigated the impact of choices made at certain stages of the keyword analysis process, the impact of the choice of benchmark corpus has largely been overlooked. In this paper, we compare a target corpus with several benchmark corpora and show that the keywords generated are different. We also show that certain characteristics of the keyword list and of the keywords themselves vary in relatively predictable ways depending on the benchmark corpus. These variations have implications for the choice of benchmark corpus and how the results of a keyword analysis should be interpreted. Analyzing the keywords from a comparison with a large general corpus or the keyword lists from multiple comparisons may be most appropriate for register studies.


Article metrics loading...

Loading full text...

Full text loading...


  1. Archer, D., Wilson, A., & Rayson, P.
    (2002) Introduction to the USAS category system. Retrieved fromucrel.lancs.ac.uk/usas/usas_guide.pdf.
  2. Baker, P.
    (2009) The question is, how cruel is it? Keywords, fox hunting and the house of commons. InD. Archer (Ed.), What’s in a word-List?: Investigating word frequency and key word extraction (pp.125–136). Aldershot: Ashgate.
    [Google Scholar]
  3. Biber, D.
    (1988) Variation across speech and writing. Cambridge: Cambridge University Press. 10.1017/CBO9780511621024
    https://doi.org/10.1017/CBO9780511621024 [Google Scholar]
  4. Biber, D., Conrad, S., & Reppen, R.
    (1998) Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press. 10.1017/CBO9780511804489
    https://doi.org/10.1017/CBO9780511804489 [Google Scholar]
  5. Bigi, B., Brun, A., Haton, J. P., Smaïli, K., & Zitouni, I.
    (2001) A comparative study of topic identification on newspaper and e-mail. Proceedings of the 8th International Symposium on String Processing and Information Retrieval (pp.238–241). Retrieved fromhttps://hal.inria.fr/inria-00107535/document. 10.1109/SPIRE.2001.989770
    https://doi.org/10.1109/SPIRE.2001.989770 [Google Scholar]
  6. Blaxter, T. T.
    (2014) Applying keyword analysis to gendered language in the Íslendingasögur. Nordic Journal of Linguistics, 37(2), 169–198. 10.1017/S0332586514000171
    https://doi.org/10.1017/S0332586514000171 [Google Scholar]
  7. Camiciottoli, B. C.
    (2016) “All those Elvis-meets-golf-player looks”: A corpus-assisted analysis of creative compounds in fashion blogging. Discourse, Context & Media, 12, 77–86. 10.1016/j.dcm.2015.10.002
    https://doi.org/10.1016/j.dcm.2015.10.002 [Google Scholar]
  8. Cochran, W. G.
    (1977) Sampling techniques (3rd Ed.). New York: John Wiley & Sons.
    [Google Scholar]
  9. Cselle, G., Albrecht, K., & Wattenhofer, R.
    (2007) Buzztrack: Topic detection and tracking in email. Proceedings of the 12th International Conference on Intelligent User Interfaces (pp.190–197). 10.1145/1216295.1216331
    https://doi.org/10.1145/1216295.1216331 [Google Scholar]
  10. Culpeper, J.
    (2009) Keyness: Words, parts-of-speech and semantic categories in the character-talk of Shakespeare’s Romeo and Juliet. International Journal of Corpus Linguistics, 14(1), 29–59. 10.1075/ijcl.14.1.03cul
    https://doi.org/10.1075/ijcl.14.1.03cul [Google Scholar]
  11. Ferret, O., & Grau, B.
    (2000) A topic segmentation of texts based on semantic domains. Proceedings of the 14th European Conference on Artificial Intelligence (pp.426–430). IOS Press.
    [Google Scholar]
  12. Gabrielatos, C., & Marchi, A.
    (2012) Keyness: appropriate metrics and practical issues. Critical Approaches to Discourse Studies. Bologna. Retrieved fromrepository.edgehill.ac.uk/4196/1/Gabrielatos%26Marchi-Keyness-CADS2012.pdf.
    [Google Scholar]
  13. Gardner, D.
    (2007) Validating the construct of word in applied corpus-based vocabulary research: A critical survey. Applied Linguistics, 28(2), 241–265. 10.1093/applin/amm010
    https://doi.org/10.1093/applin/amm010 [Google Scholar]
  14. Geluso, J., & Hirch, R.
    (2019) The reference corpus matters: Comparing the effect of different reference corpora on keyword analysis. Register Studies, 1(2), 209–242. 10.1075/rs.18001.gel
    https://doi.org/10.1075/rs.18001.gel [Google Scholar]
  15. Gerbig, A.
    (2010) Key words and key phrases in a corpus of travel writing. InM. Bondi & M. Scott (Eds.), Keyness in texts (pp.147–168). Amsterdam: John Benjamins. 10.1075/scl.41.11ger
    https://doi.org/10.1075/scl.41.11ger [Google Scholar]
  16. Gilmore, A., & Millar, N.
    (2018) The language of civil engineering research articles: A corpus-based approach. English for Specific Purposes, 51, 1–17. 10.1016/j.esp.2018.02.002
    https://doi.org/10.1016/j.esp.2018.02.002 [Google Scholar]
  17. Goh, G. Y.
    (2011) Choosing a reference corpus for keyword calculation. Linguistic Research, 28(1), 239–256. 10.17250/khisli.28.1.201104.013
    https://doi.org/10.17250/khisli.28.1.201104.013 [Google Scholar]
  18. Harvey, K., Churchill, D., Crawford, P., Brown, B., Mullany, L., Macfarlane, A., & McPherson, A.
    (2008) Health communication and adolescents: What do their emails tell us?. Family Practice, 25(4), 304–311. 10.1093/fampra/cmn029
    https://doi.org/10.1093/fampra/cmn029 [Google Scholar]
  19. Hyland, K.
    (2004) Disciplinary discourses: Social interactions in academic writing. Ann Arbor, Michigan: University of Michigan Press.
    [Google Scholar]
  20. Jones, C., Byrne, S., & Halenko, N.
    (2018) Successful spoken English: Findings from learner corpora. London: Routledge.
    [Google Scholar]
  21. Kilgarriff, A., & Berber Sardinha, T.
    (2000) Proceedings of the Workshop on Comparing Corpora. Hong Kong.
    [Google Scholar]
  22. Kotzé, E. F.
    (2010) Author identification from opposing perspectives in forensic linguistics. Southern African Linguistics and Applied Language Studies, 28(2), 185–197. 10.2989/16073614.2010.519111
    https://doi.org/10.2989/16073614.2010.519111 [Google Scholar]
  23. Loudermilk, B. C.
    (2007) Occluded academic genres: An analysis of the MBA thought essay. Journal of English for Academic Purposes, 6(3), 190–205. 10.1016/j.jeap.2007.07.001
    https://doi.org/10.1016/j.jeap.2007.07.001 [Google Scholar]
  24. Meier, H. E., Rose, A., & Hölzen, M.
    (2017) Spirals of signification? A corpus linguistic analysis of the German doping discourse. Communication & Sport, 5(3), 352–373. 10.1177/2167479515610151
    https://doi.org/10.1177/2167479515610151 [Google Scholar]
  25. Meltzer, E. O., Wallace, D., Dykewicz, M., & Shneyer, L.
    (2016) Minimal clinically important difference (MCID) in allergic rhinitis: Agency for healthcare research and quality or anchor-based thresholds?. The Journal of Allergy and Clinical Immunology: In Practice, 4(4), 682–688. 10.1016/j.jaip.2016.02.006
    https://doi.org/10.1016/j.jaip.2016.02.006 [Google Scholar]
  26. Nkechinyere, E. M., Andrew, I., & Idochi, O.
    (2015) Comparison of different methods of outlier detection in univariate time series data. International Journal for Research in Mathematics and Statistics, 1(1), 55–83.
    [Google Scholar]
  27. Paquot, M., & Bestgen, Y.
    (2009) Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction. Papers from the 29th International Conference on English Language Research on Computerized Corpora (ICAME 29) (pp.247–269). Retrieved fromhdl.handle.net/2078.1/76052. 10.1163/9789042029101_014
    https://doi.org/10.1163/9789042029101_014 [Google Scholar]
  28. Pojanapunya, P.
    (2017) A theory of keywords. (Doctoral dissertation). Retrieved fromhttps://opac.lib.kmutt.ac.th/vufind/Record/1370763.
  29. Pojanapunya, P., & Watson Todd, R.
    (2018) Log-likelihood and odds ratio: Keyness statistics for different purposes of keyword analysis. Corpus Linguistics and Linguistic Theory, 14(1), 133–167. 10.1515/cllt‑2015‑0030
    https://doi.org/10.1515/cllt-2015-0030 [Google Scholar]
  30. Scharl, A., & Weichselbraun, A.
    (2008) An automated approach to investigating the online media coverage of US presidential elections. Journal of Information Technology and Politics, 5(1), 121–132. 10.1080/19331680802149582
    https://doi.org/10.1080/19331680802149582 [Google Scholar]
  31. Scott, M.
    (1997) PC analysis of key words – and key key words. System, 25(2), 233–245. 10.1016/S0346‑251X(97)00011‑0
    https://doi.org/10.1016/S0346-251X(97)00011-0 [Google Scholar]
  32. (2006) In search of a bad reference corpus. Paper presented atWord Frequency and Keyword Extraction: AHRC ICT Methods Network Expert Seminar on Linguistics., Lancaster University, UK. Retrieved fromhttps://citeseerx.ist.psu.edu/viewdoc/download?doi=
    [Google Scholar]
  33. Scott, M., & Tribble, C.
    (2006) Textual patterns: Key words and corpus analysis in language education. Amsterdam: John Benjamins. 10.1075/scl.22
    https://doi.org/10.1075/scl.22 [Google Scholar]
  34. Swales, J.
    (1990) Genre analysis: English in academic and research settings. Cambridge: Cambridge University Press.
    [Google Scholar]
  35. Willis, R.
    (2017) Taming the climate? Corpus analysis of politicians’ speech on climate change. Environmental Politics, 26(2), 212–231. 10.1080/09644016.2016.1274504
    https://doi.org/10.1080/09644016.2016.1274504 [Google Scholar]
  36. Xiao, Z., & McEnery, A.
    (2005) Two approaches to genre analysis: Three genres in modern American English. Journal of English Linguistics, 33(1), 62–82. 10.1177/0075424204273957
    https://doi.org/10.1177/0075424204273957 [Google Scholar]

Data & Media loading...

  • Article Type: Research Article
Keyword(s): aboutness; keyword analysis; reference corpus; register
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error