- Home
- e-Journals
- International Journal of Corpus Linguistics
- Previous Issues
- Volume 8, Issue, 2003
International Journal of Corpus Linguistics - Volume 8, Issue 1, 2003
Volume 8, Issue 1, 2003
-
From translational data to contrastive knowledge
Author(s): Olivier Kraifpp.: 1–29 (29)More LessTextual aligning consists in pairing segments (e.g. sentences or phrases) that are translational equivalents across corpora of translations. An interesting application of textual aligning is the automatic extraction of bilingual lexicons. As it has been pointed out during previous evaluation campaigns, such as Arcade, lexical aligning remains problematic. In order to solve problems of consistency linked with the concept of translational compositionality, a redefinition of lexical aligning task is proposed, introducing the concept of lexical correspondence. Simple techniques dedicated to lexical correspondences extraction are then evaluated. Thus, it appears that adapted statistical filters allow to extract very accurately significant regularities that are relevant at the contrastive level. More generally, these methods prove to be adapted not only for bilingual lexicons extraction: they could be used to study a wide range of contrastive phenomena on empirical basis.
-
Testing the sub-test
Author(s): Stefan Th. Griespp.: 31–61 (31)More LessThis paper pursues two objectives, one linguistic and one methodological in nature. First, it is concerned with a corpus-based analysis of the degree to which pairs of -ic/-ical adjectives (e.g. classic/classical) are synonymous. Second, it investigates whether Church et al.'s (1994) sub-test can be fruitfully applied to this phenomenon. As to the first issue, I conclude that individual -ic/-ical adjectives can be located on a continuum of semantic similarity, with some being virtually completely synonymous and others being strongly differentiated; several semantic and distributional distinctions between members of adjective pairs are pointed out on the basis of distinctive collocates. As to the second question, I demonstrate on the basis of a simulation that the sub-test is conceptually adequate, but suffers from its asymptotic approach, which is why Fisher-exact is argued to be a more adequate diagnostic.
-
SPAACy – A semi-automated tool for annotating dialogue acts
Author(s): Martin Weisserpp.: 63–74 (12)More LessThis article reports on a pilot project which aims at creating a speech-act annotated training corpus for service dialogue systems. In order to achieve this aim, an annotation tool, which allows us to automate large parts of the annotation, is being developed. This tool converts text-based transcriptions into XML and applies different levels of markup to each dialogue, so that there remains as little post-editing to be done as possible. The project also aims at developing a relatively generic mark-up scheme that may be applied to different domains without needing a large degree of adaptation. This article describes aspects of the grammar ‘controlling/governing’ the tool and how this grammar ‘interacts’ with the general strategies employed in the annotation.
-
A usage-based approach to argument structure
Author(s): Hongyin Taopp.: 75–95 (21)More LessThe English verbs remember and forget are typically treated by syntacticians as mental process verbs whose argument structure is characterized by a variety of possible complements. Based on extensive corpus data, Tao (2001) investigates the use of remember in spoken English and proposes that complement-taking is actually a marginal feature of remember, and remember can be seen as undergoing changes toward becoming a discourse particle in spoken English. This paper extends the previous study by bringing in forget for comparison against remember. It is shown that while both remember and forget disprefer complements, forget lacks the placement flexibility seen in remember but allows more tense options; at the same time, forget also has its own pragmatically strengthened patterns in such combinations as ‘forget it’ and ‘don't forget to.’ Overall this study shows that not only can usage-based investigations provide a realistic account of argument structure, a usage-based approach is also instrumental in elucidating the varied local patterns that are often confined to individual linguistic entities or sequences in highly specified contexts, a conclusion which supports the emergent view of argument structure.
-
The textlinguistic dimension of corpus linguistics
Author(s): Michaela Mahlbergpp.: 97–108 (12)More LessCorpus research can provide important insights into different areas of language description. The present paper takes a textlinguistic approach to the description of English and puts into perspective the ‘support function’ of general nouns such as man, move and thing. The support function captures various ways in which general nouns are used to present information appropriately in a given context. Specifically, three aspects of the support function are discussed: ‘giving emphasis’, ‘adding information in passing’ and ‘providing an introduction’. From a more theoretical point of view, the present paper argues for an integration of the pattern grammar approach with a textlinguistic perspective.
-
Automatic extraction of meaningful units from corpora
Author(s): Pernilla Danielssonpp.: 109–127 (19)More LessIn this article, we will reconsider the notion of a word as the basic unit of analysis in language and propose that in an information and meaning carrying system the unit of analysis should be a unit of meaning (UM). Such a UM may consist of one or more words. A method will be promoted that attempts to automatically retrieve UMs from corpora. To illustrate the results that may be obtained by this method, the node word ‘stroke’ will be used in a small study. The results will be discussed, with implications considered for both monolingual and multilingual use. The monolingual study will benefit from using the British National Corpus, while the multilingual study introduces a parallel corpus consisting of Swedish novels and their translations into English.
Volumes & issues
-
Volume 29 (2024)
-
Volume 28 (2023)
-
Volume 27 (2022)
-
Volume 26 (2021)
-
Volume 25 (2020)
-
Volume 24 (2019)
-
Volume 23 (2018)
-
Volume 22 (2017)
-
Volume 21 (2016)
-
Volume 20 (2015)
-
Volume 19 (2014)
-
Volume 18 (2013)
-
Volume 17 (2012)
-
Volume 16 (2011)
-
Volume 15 (2010)
-
Volume 14 (2009)
-
Volume 13 (2008)
-
Volume 12 (2007)
-
Volume 11 (2006)
-
Volume 10 (2005)
-
Volume 9 (2004)
-
Volume 8 (2003)
-
Volume 7 (2002)
-
Volume 6 (2001)
-
Volume 5 (2000)
-
Volume 4 (1999)
-
Volume 3 (1998)
-
Volume 2 (1997)
-
Volume 1 (1996)
Most Read This Month

-
-
The Spoken BNC2014
Author(s): Robbie Love, Claire Dembry, Andrew Hardie, Vaclav Brezina and Tony McEnery
-
- More Less