Full text loading...
-
The Representativeness of Czech corpora
- Source: International Journal of Corpus Linguistics, Volume 10, Issue 3, Jan 2005, p. 357 - 366
Abstract
The attempt to balance corpora with respect to their future usage led to the introduction of the termexpectations(Králík 2001b). On the bases of several statistical inquiries of such expectations, the textual structure ofSYN2000,which is the synchronic part of the Czech National Corpus (CNC), was proposed and realised. The present article explains the original composition briefly and discusses two new inquiries concerning expectations(A-2001andC-2001).Important corrections for future work on the CNC are suggested. The expectations concerning newspapers changed radically during 1996–2001. Within the same period, an obvious rise of interest in fiction can be detected. The reasons for these developments can be traced to trends in Czech society. Thus, we have proposed a considerable reduction in the proportion of newspaper texts and a large increase in the proportion of fiction texts. According to new searches, more detailed percentages for specific subject areas are suggested.