- Home
- e-Journals
- International Journal of Corpus Linguistics
- Previous Issues
- Volume 2, Issue, 1997
International Journal of Corpus Linguistics - Volume 2, Issue 2, 1997
Volume 2, Issue 2, 1997
-
Czech National Corpus: A Case in Many Contexts
Author(s): Frantiek Čermákpp.: 181–197 (17)More LessAgainst the background of some of the major linguistic problems which demand our attention and which should point to some badly-needed criteria, the brief history and structure of the Czech National Corpus is outlined. The points seen as open include differences between various languages in their degree of ex-plicitness, form-function relation, ellipsis, etc. It is argued that a more general and language-independent approach is necessary to handle, among other things, the multi-word units of the text; a general corpus maintenance and query system available to the increasing number of would-be users is required, too. The particular Czech solution, still being worked out and gradually implemented, is described in some detail.
-
Text Categories and Where You Can Stick Them: A Crude Formality Index
Author(s): Robert J. Sigleypp.: 199–237 (39)More LessThis paper applies principal components analysis (PCA) to solve the problem of interpreting pre-existing corpus text categories for analysis of linguistic variation. The method is illustrated by constructing an index of the complex notion "formality " from PCA of a set of high-frequency wordform-based counts. The first principal component from this analysis acts as a broad formality index; a second principal component is tentatively identified as marking "concrete facts" versus "abstract discussion"'. Subsequently, text categories from the corpora are positioned on these textual dimensions, and selected categories are evaluated for internal consistency by comparing the distribution of texts across subcategories. Finally, suggestions are made concerning further developments and applications of the method used here, and its implications for the use of corpora in variation studies.
-
Annotating the Contemporary Chinese Corpus
Author(s): Qiang Zhou and Shiwen Yupp.: 239–258 (20)More LessIn recent years, great progress has been made in Chinese corpus processing. A fifty-million-word Chinese National Corpus project has been put into effect, and many automatic corpus processing programs have also been developed. In this paper, we will briefly introduce our work on constructing a large scale annotated corpus for Chinese grammatical research and developing a Chinese Corpus Multilevel Processing system—CCMP. First, we present our annotation scheme. Second, we discuss some basic methodologies for Chinese corpus analysis and propose a man-machine mutually dependent corpus processing model. Finally, we introduce the survey of our CCMP. We hope our work will give impetus to further research in Chinese corpus linguistics.
-
Predictability of Word Forms (Types) and Lemmas in Linguistic Corpora. A Case Study Based on the Analysis of the CUMBRE Corpus: An 8-Million-Word Corpus of Contemporary Spanish
Author(s): Aquilino Sánchez and Pascual Cantos-Gomezpp.: 259–280 (22)More LessVarious research centres and publishing companies all around the world have been developing corpus resources for many years, and there has been a growing awareness throughout the eighties of their importance to linguistic and lexicographic work. To give some idea of scale, the British National Corpus contains 100 million words, and its counterpart for Spanish—compiled by the Spanish Real Academia de la Lengua—will reach 100 million words at first and 200 million words in a second stage. However, little convincing research has been done in the direction of sample size—directly connected to a further topic: representativeness. We shall investigate here a related issue: Is it possible to predict the different word forms and lemmas of a given corpus? And if so, how? A positive answer to this question may contribute to decision making regarding some aspects of representativeness in given fields. We shall attempt further to find a reliable procedure to predict the total number of word forms (types) and lemmas in a specific corpus.
Volumes & issues
-
Volume 29 (2024)
-
Volume 28 (2023)
-
Volume 27 (2022)
-
Volume 26 (2021)
-
Volume 25 (2020)
-
Volume 24 (2019)
-
Volume 23 (2018)
-
Volume 22 (2017)
-
Volume 21 (2016)
-
Volume 20 (2015)
-
Volume 19 (2014)
-
Volume 18 (2013)
-
Volume 17 (2012)
-
Volume 16 (2011)
-
Volume 15 (2010)
-
Volume 14 (2009)
-
Volume 13 (2008)
-
Volume 12 (2007)
-
Volume 11 (2006)
-
Volume 10 (2005)
-
Volume 9 (2004)
-
Volume 8 (2003)
-
Volume 7 (2002)
-
Volume 6 (2001)
-
Volume 5 (2000)
-
Volume 4 (1999)
-
Volume 3 (1998)
-
Volume 2 (1997)
-
Volume 1 (1996)
Most Read This Month
Article
content/journals/15699811
Journal
10
5
false
-
-
Comparing Corpora
Author(s): Adam Kilgarriff
-
- More Less