Volume 10, Issue 1
  • ISSN 0155-0640
  • E-ISSN: 1833-7139
Buy:$35.00 + Taxes


This paper provides an overview of various English language corpora. It examines the relationships between the various extrant corpora and also indicates some of the features of a corpus of written English being developed in Australia. The article considers some of the linguistic and theoretical constraints on corpus-based research.


Article metrics loading...

Loading full text...

Full text loading...


  1. Aarts, J. and W. Meijs
    (1984) Corpus linguistics: recent developments in the use of computer corpora in English language research. Amsterdam, Rodopi.
    [Google Scholar]
  2. (eds.) (1986) Corpus linguistics II: new studies in the analysis and exploitation of computer corpora. Amsterdam, Rodopi. Aijmer, K. (1987) Oh and ah in English conversation. In Meijs (ed.) (1987): 61–86.
    [Google Scholar]
  3. Altenberg, B.
    (1987) Prosodic patterns in spoken English: studies in the correlation between prosody and grammar for text-to-speech conversation. Lund Studies in English 76. Lund, Lund University Press.
    [Google Scholar]
  4. Atwell, E.
    (1983) Constituent likelihood grammar. ICAME News7:34–67. Norwegian Computing Centre for Humanities.
    [Google Scholar]
  5. Atwell, E. , G. Leech and R. Garside
    (1984) Analysis of the LOB Corpus: progress and prospects. In Aarts and Meijs (1984): 41–52.
    [Google Scholar]
  6. Biber, D.
    (1985) Investigating macroscopic textual variation through multi-feature/multi-dimensional analyses. Linguistics32,2:337–60.
    [Google Scholar]
  7. (forthcoming) Spoken and written textual dimensions in English: Resolving the contradictory findings. Language62:384–414. doi: 10.2307/414678
    https://doi.org/10.2307/414678 [Google Scholar]
  8. Briscoe, T. , I. Craig and C. Clover
    (1987) The use of the LOB Corpus in the development of a phrase structure grammar of English. In Meijs (1987): 207–218.
    [Google Scholar]
  9. Coates, J.
    (1983) The semantics of modal auxiliaries. London and Canberra, Croom Helm.
    [Google Scholar]
  10. Collins, P.C.
    (1985) Th-clefts and all-clefts. Beiträge zur Phonetik und Linguistik4:45–53.
    [Google Scholar]
  11. (1987) Cleft and pseudo-cleft constructions in English spoken and written discourse. ICAME Journal11:5–17.
    [Google Scholar]
  12. Collins, P.C. and P. Peters
    (forthcoming) The Australian Corpus Project. In Ihalainen, O. , M. Kytö and M. Rissanen (eds.) Proceedings of the Eighth International Conference on English Language Research on Computerized Corpora. Amsterdam, Rodopi (to appear).
    [Google Scholar]
  13. Eeg-Olofsson, M. and J. Svartvik
    (1984) Four-level tagging of spoken English. In Aarts and Meijs (1984): 53–64.
    [Google Scholar]
  14. Ellegärd, A.
    (1978) The syntactic structure of English texts: a computer based study of four kinds of text in the Brown University Corpus. (Gothenburg Studies in English, 43), Gothenburg University.
    [Google Scholar]
  15. Fjelkestan-Nilsson, B.
    (1983) ALSO and TOO: a corpus-based study of their frequency and use in Modern English. Stockholm, Stockholm Studies in English, LVIII.
    [Google Scholar]
  16. Francis, W.N.
    (1980) A tagged corpus – problems and prospects. In S. Greenbaum , G. Leech and J. Svartvik (eds.) Studies in English linguistics: for Randolph Quirk. London, Longman: 192–209.
    [Google Scholar]
  17. (1982) Problems of assembling and computerizing large corpora. In Johansson (1982): 7–24.
    [Google Scholar]
  18. Francis, W.N. and H. Kučera
    (1964) Manual of information to accompany a standard corpus of present-day edited American English, for use with digital computers. Providence, R.I., Department of Linguistics, Brown University.
    [Google Scholar]
  19. (1982) Frequency analysis of English usage: lexicon and grammar. Boston, Houghton Mifflin.
    [Google Scholar]
  20. Garside, R. and G.N. Leech
    (1982) Grammatical tagging of the LOB Corpus: general survey. In Johansson (1982): 110–117.
    [Google Scholar]
  21. Geens, D.
    (1975/6) Analysis of present-day English theatrical language 1966-72. Leuven, K.U.
  22. Greenbaum, S. and R. Quirk
    (1970) Elicitatlon experiments in English: linguistic studies in use and attitude. London, Longman.
    [Google Scholar]
  23. Greene, B.B. and G.M. Rubin
    (1971) Automatic grammatical tagging of English. Providence, R.I., Department of Linguistics, Brown University.
    [Google Scholar]
  24. Hofland, K. and S. Johansson
    (1982) Word frequencies in British and American English. Bergen, Norwegian Computing Centre for the Humanities.
    [Google Scholar]
  25. Ihalainen, O. , M. Kytö and M. Rissanen
    (1987) The Helsinki Corpus of English Texts: diachronic and dialectal report on work in progress. In Meijs (1987): 21–32.
    [Google Scholar]
  26. Johansson, S.
    (ed.) (1982) Computer corpora in English language research. Bergen, Norwegian Computing Centre for the Humanities.
    [Google Scholar]
  27. Johansson, S. , G. Leech and H. Goodluck
    (1978) Manual of information to accompany the Lancaster-Oslo/Bergen Corpus of British English, for use with digital computers. Oslo, Department of English, University of Oslo.
    [Google Scholar]
  28. Johansson, S. and M.C. Jahr
    (1982) Grammatical tagging of the LOB: predicting word class from word endings. In Johansson (1982): 118–146.
    [Google Scholar]
  29. Johansson, S. and E.H. Norheim
    (1988) The subjunctive in British and American English. ICAME Journal12:56–57.
    [Google Scholar]
  30. Johansson, S. and K. Hofland
    (forthcoming) Frequency analysis of English vocabulary and grammar.
    [Google Scholar]
  31. Kaye, G.
    (1988) The design of the database for the Survey of English Usage.ICAME Journal12:56–57.
    [Google Scholar]
  32. Kjellmer, G.
    (1986) ‘The lesser man’: Observations on the role of women in modern English writings. In Aarts and Meijs (1986): 163–176.
    [Google Scholar]
  33. Leech, G. , R. Garslde and E. Atwell
    (1983a) The automatic grammatical tagging of the LOB Corpus. ICAME News7:13–33.
    [Google Scholar]
  34. Leech, G. R. Garside and E. Atwell
    (1983b) Recent developments in the use of computer corpora in English Language research. Transactions of the Philological Society: 23–40. doi: 10.1111/j.1467‑968X.1983.tb01200.x
    https://doi.org/10.1111/j.1467-968X.1983.tb01200.x [Google Scholar]
  35. Leech, G. and A. Beale
    (1985) Computers in English language research. Language Teaching17,3:216–29. doi: 10.1017/S0261444800010685
    https://doi.org/10.1017/S0261444800010685 [Google Scholar]
  36. Marshall, I.
    (1938) Choice of grammatical word-class without global syntactic analysis: tagging words in the LOB Corpus. Computers and the Humanities17,3:139–50. doi: 10.1007/BF02259886
    https://doi.org/10.1007/BF02259886 [Google Scholar]
  37. Martin, J.R.
    (1984) Language, register and genre. In F. Christie (ed.) Language studies: children writing. Geelong, Victoria, Deakin University Press: 21–30.
    [Google Scholar]
  38. Meijs, W.
    (ed.) (1987) Corpus linguistics and beyond. Amsterdam, Rodopi.
    [Google Scholar]
  39. Oddy, R.N. , S.E. Robertson , C.J. van Rigsbergen and P.W. Williams
    (eds.) (1981) Information retrieval research. London, Butterworths.
    [Google Scholar]
  40. Oostdijk, N.
    (1988) A corpus for studying linguistic variation. ICAME Journal12:3–14.
    [Google Scholar]
  41. Peters, P.
    (1987) Towards a corpus of Australian English. ICAME Journal11:27–38.
    [Google Scholar]
  42. Quirk, R. and J. Svarvik
    (1966) Investigating linguistic acceptability. The Hague, Mouton.
    [Google Scholar]
  43. Sampson, G.
    (1987) Evidence against the ‘grammatical/ungrammatical’ distinction. In Meijs (1987): 219–226.
    [Google Scholar]
  44. Shastri, S.V.
    (1980) A computer corpus of present-day Indian English. ICAME News4:9–12.
    [Google Scholar]
  45. (1985) Word frequencies in Indian English: a preliminary report. ICAME News9:38–44.
    [Google Scholar]
  46. (1988) The Kolhapur Corpus of Indian English and work done on its basis so far. ICAME Journal12:15–26.
    [Google Scholar]
  47. Sinclair, J.McH.
    (1982) Reflections on computer corpora in English language research. In Johansson (1982): 1–6.
    [Google Scholar]
  48. Svartvik, J.
    (1984) Text Segmentation for Speech (TESS): presentation of a project. Survey of Spoken English, Lund University.
  49. Svartvik, J. , M. Eeg-Olofsson , O. Forsheden , B. Orestrom and C. Thavenius
    (eds.) (1982) A Survey of Spoken English: report on research 1975-81. Lund, Gleerup.
    [Google Scholar]
  50. Svartvik, J. and M. Eeg-Olofsson
    (1982) Tagging the London-Lund Corpus of Spoken English. In Johansson (1982): 85–109.
    [Google Scholar]
  51. Svartvik J. and R. Quirk
    (eds.) (1980) A corpus of English conversation. Lund, Gleerup/Liber.
    [Google Scholar]
  52. Thavenius, C.
    (1982) Exophora in English conversation. In N.E. Enkvist (ed.) (1982) Impromptu speech: a symposium. Åbo, Åbo Akademi: 291–305.
    [Google Scholar]
  53. Tottie, G. , B. Altenberg and L. Hermeràn
    (1983) English in speech and writing. ETOS Report 1. Lund and Uppsala: the Departments of English and the Universities of Lund and Uppsala.
    [Google Scholar]
  • Article Type: Research Article
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error