- Home
- e-Journals
- International Journal of Corpus Linguistics
- Previous Issues
- Volume 9, Issue, 2004
International Journal of Corpus Linguistics - Volume 9, Issue 1, 2004
Volume 9, Issue 1, 2004
-
Automatic acquisition of verb subcategorization information by exploiting mininal linguistic resources
Author(s): Katia Lida Kermanidis, Nikos Fakotakis and George Kokkinakispp.: 1–28 (28)More LessA set of well known statistical filtering methods (binomial hypothesis testing, log-likelihood ratio, t-test, thresholds on relative frequencies) is used on Modern Greek and English corpora in order to automatically acquire verb subcategorization frames that are not limited in number and are not known beforehand. As sophisticated linguistic resources and tools are not available for most languages (including Modern Greek), pre-processing of our corpora reaches merely the stage of elementary, intrasentential, non-embedded phrase chunking. By forming, permutating and counting subsets of the verb's neighboring set of phrases, and by applying the statistical filters mentioned previously, valid syntactic frames of verbs are detected. The results achieved were comparable to and, in several cases, better than the ones of previous approaches, even approaches utilizing richer resources. Incorporating the extracted list of frames into a shallow parser, the performance of the latter increases by almost 6%, showing thereby the importance of the acquired knowledge.
-
Clause alignment for Hong Kong legal texts: A lexical-based approach
Author(s): Chunyu Kit, Jonathan J. Webster, King-Kui Sin, Haihua Pan and Heng Lipp.: 29–51 (23)More LessIn this paper we report on our recent work in clause alignment for English-Chinese bilingual legal texts using available lexical resources including a bilingual legal glossary and a bilingual dictionary, for the purpose of acquiring examples at various linguistic levels for example-based machine translation. We present our formulation of an appropriate measure for the similarity of a candidate pair of clauses with respect to matched lexical items and the corresponding implementation of an effective algorithm for clause alignment based on this similarity measure. Experimental results show that the similarity measure and the lexical-based clause alignment algorithm, though very simple, are very effective, with a performance of 94.6% alignment accuracy. It confirms our intuition that lexical information gives a reliable indication of correct alignment. The significance of this lexical-based approach lies in both its simplicity and effectiveness.
-
MICE: A module for Named Entities Recognition and Classification
pp.: 53–68 (16)More LessIn the field of corpus linguistics, Named Entity treatment includes the recognition and classification of different types of discursive elements like proper names, date, time, etc. These discursive elements play an important role in different Natural Language Processing applications and techniques such as Information Retrieval, Information Extraction, translations memories, document routers, etc.
-
The notion of a “lemma”: Headwords, roots and lexical sets
Author(s): Gerry Knowles and Zuraidah Mohd Donpp.: 69–81 (13)More LessThe notion of alemmais so familiar in corpus linguistics that it scarcely needs a formal definition. When a wordlist or a text is lemmatised, the process is apparently transparent, so that any observer can understand how the lemma relates to the original set or string of words. We shall argue in this paper that, on the contrary, the concept of lemma is not well defined, and is in need of a clear formal definition. The lemma is a fundamental concept in the processing of texts in at least some languages, a point we shall illustrate with respect to Arabic and Malay. It so happens that English lemmas are not typical of the general category, so that linguists who base their understanding of the lemma on English obtain a distorted view. It is essential to reverse the direction of argument, and to start with a general understanding of the lemma, and to consider English lemmas in the wider context.
-
Modality in Czech and English: Possibility particles and the conditional mood in a parallel corpus
Author(s): Frantiek Čermák and Ale Klégrpp.: 83–95 (13)More LessThe paper examines two kinds of modality exponents and their interlingual relationships, using an aligned parallel minicorpus of two contemporary Czech originals (drama and novel) and their English translations. It focuses on four most frequent Czech adverbial particles of possibility/approximation:snad, mozná, asi, nejspíe,and the Czech conditional mood marker by in the texts and their equivalents. It contrasts the findings with the equivalents in the latest and largest Czech-English dictionary. The results confirm that in either case the lexicographic description is insufficient both in the range of equivalents offered and their respective representativeness.
-
Extending collostructional analysis: A corpus-based perspective on `alternations'
Author(s): Stefan Th. Gries and Anatol Stefanowitschpp.: 97–129 (33)More LessThis paper introduces an extension of distinctive-collocate analysis that takes into account grammatical structure and is specifically geared to investigating pairs of semantically similar grammatical constructions and the lexemes that occur in them. The method, referred to as `distinctive-collexeme analysis', identifies lexemes that exhibit a strong preference for one member of the pair as opposed to the other, and thus makes it possible to identify subtle distributional differences between the members of such a pair. The method can be applied in the context of what is sometimes referred to as `grammatical alternation' (e.g. the dative alternation), but it can also be applied to other choices provided by the grammar (such as the two future tense constructions in English). The method has two main applications. First, it can reveal subtle differences between seemingly synonymous constructions, many of which are difficult to identify on the basis of more traditional approaches. Second, it can be used to investigate the very notion of `alternation'; we show that many alternations are much more restricted than has hitherto been assumed, and thus confirm the claims of recent, non-derivational views of grammar.
-
"Utterly content in each other's company": Semantic prosody and semantic preference
Author(s): Alan Partingtonpp.: 131–156 (26)More LessIn this paper I wish to examine the two related concepts of semantic prosody and semantic preference. I will begin the section on each with a definition, attempt a review of relevant current positions and then describe a number of corpus-based experiments I conducted to throw light on the two phenomena. Finally, I will try to draw some conclusions about the relationship between them.
Volumes & issues
-
Volume 29 (2024)
-
Volume 28 (2023)
-
Volume 27 (2022)
-
Volume 26 (2021)
-
Volume 25 (2020)
-
Volume 24 (2019)
-
Volume 23 (2018)
-
Volume 22 (2017)
-
Volume 21 (2016)
-
Volume 20 (2015)
-
Volume 19 (2014)
-
Volume 18 (2013)
-
Volume 17 (2012)
-
Volume 16 (2011)
-
Volume 15 (2010)
-
Volume 14 (2009)
-
Volume 13 (2008)
-
Volume 12 (2007)
-
Volume 11 (2006)
-
Volume 10 (2005)
-
Volume 9 (2004)
-
Volume 8 (2003)
-
Volume 7 (2002)
-
Volume 6 (2001)
-
Volume 5 (2000)
-
Volume 4 (1999)
-
Volume 3 (1998)
-
Volume 2 (1997)
-
Volume 1 (1996)
Most Read This Month
Article
content/journals/15699811
Journal
10
5
false

-
-
The Spoken BNC2014
Author(s): Robbie Love, Claire Dembry, Andrew Hardie, Vaclav Brezina and Tony McEnery
-
- More Less