1887
image of The Coronavirus Corpus
USD
Buy:$35.00 + Taxes

Abstract

Abstract

This paper discusses the creation and use of the Coronavirus Corpus, which is currently (March 2021) 900 million words in size, and which will probably be about one billion words in size by May–June 2021. The Coronavirus Corpus is a subset of the NOW Corpus (News on the Web), which is currently about 12.1 billion words in size and which grows by about two billion words each year. These two corpora are updated every night, with about 6–10 million words for NOW and 2–3 million words for the Coronavirus Corpus. The Coronavirus Corpus allows users to see the frequency of words and phrases over time (even by individual day), and users can find all words that are more frequent in one time period than another. Users can also see the collocates for words and phrases, and compare the collocates to see what is being said about particular topics over time.

Loading

Article metrics loading...

/content/journals/10.1075/ijcl.21044.dav
2021-05-03
2021-05-07
Loading full text...

Full text loading...

References

  1. Davies, M.
    (2015) Corpora: An introduction. InD. Biber & R. Reppen (Eds.), Cambridge Handbook of English Corpus Linguistics (pp.11–31). Cambridge University Press. 10.1017/CBO9781139764377.002
    https://doi.org/10.1017/CBO9781139764377.002 [Google Scholar]
  2. (2017) Using large online corpora to examine lexical, semantic, and cultural variation in different dialects and time periods. InE. Friginal (Ed.), Studies in Corpus-Based Sociolinguistics (pp.19–82). Routledge. 10.4324/9781315527819‑2
    https://doi.org/10.4324/9781315527819-2 [Google Scholar]
  3. (2018) Corpus-based studies of lexical and semantic variation: The importance of both corpus size and corpus design. InC. Suhr, T. Nevalainen, & I. Taavitsainen (Eds.), From Data to Evidence in English Language Research (pp.34–55). Brill.
    [Google Scholar]
  4. Davies, M., & Fuchs, R.
    (2015) Expanding horizons in the study of World Englishes with the 1.9 billion word Global Web-Based English Corpus (GloWbE). English World-Wide, 36(1), 1–28. 10.1075/eww.36.1.01dav
    https://doi.org/10.1075/eww.36.1.01dav [Google Scholar]
http://instance.metastore.ingenta.com/content/journals/10.1075/ijcl.21044.dav
Loading
/content/journals/10.1075/ijcl.21044.dav
Loading

Data & Media loading...

  • Article Type: Research Article
Keywords: text archive; Coronavirus; COVID-19; corpus design; NOW corpus
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error