- Home
- e-Journals
- International Journal of Corpus Linguistics
- Previous Issues
- Volume 2, Issue, 1997
International Journal of Corpus Linguistics - Volume 2, Issue 1, 1997
Volume 2, Issue 1, 1997
-
Matching Corpus Translations with Dictionary Senses: Two Case Studies
Author(s): Alexander Geykenpp.: 1–22 (22)More LessThis paper addresses the question to what extent translations in bilingual parallel corpora match with dictionary senses. Automatic matching of corpus translation with dictionary senses depends on the quality of the lexicographic knowledge used, the quality of corpus processing, the impact of statistics to filter relevant entries from the corpora, and finally the quality of the translations in the multilingual corpora. We focus on the influence that the latter variable has on the performance of the automatic matching. Similarly to previous approaches, we relied on Machine Readable Dictionaries (MRDs), a part-of-speech tagger, and bilingual aligned corpora. Additionally, we used a shallow sentence parser for syntactic matching. Two case studies with two different corpora from different domains were conducted. Our test set was the intersection of 500 French communication verbs within the corpora. The results confirm that the performance of the automatic matching varies considerably with the translation quality of the parallel texts.
-
Making Sense of Corpus Data: A Case Study of Verbs of Sound
Author(s): Beth Levin and Grace Songpp.: 23–64 (42)More LessThis paper demonstrates the essential role of corpus data in the development of a theory that explains and predicts word behavior. We make this point through a case study of verbs of sound, drawing our evidence primarily from the British National Corpus. We begin by considering pretheoretic notions of the verbs of sound as presented in corpus-based dictionaries and then contrast them with the predictions made by a theory of syntax, as represented by Chomsky's Government-Binding framework. We identify and classify the transitive uses of sixteen representative verbs of sound found in the corpus data. Finally, we consider what a linguistic account with both syntactic and lexical semantic components has to offer as an explanation of observed differences in the behavior of the sample verbs.
-
Detecting the Organization of Semantic Subclasses of Japanese Verbs
Author(s): Akira Oishi and Yuji Matsumotopp.: 65–89 (25)More LessThis paper describes an approach to detect the organization of semantic subclasses of Japanese verbs. First, we classify verbs along two dimensions: thematic and aspectual. In the thematic dimension, we exploit the pattern of case marking particles which are attached to arguments of verbs. In the aspectual dimension, we exploit the classification of adverbs which modify verbs in a corpus. By combining the results of two classifications, we obtain an elaborate classification of verbs. We can incorporate the prototypicality of the members which constitute each semantic subclass by taking account of the frequency of case particles patterns and cooccurring adverbs. Moreover, the existence of close relationships among them enable us to detect the organization of these subclasses.
-
Unsupervised Learning of Linguistic Structure: An Empirical Evaluation
Author(s): David Powerspp.: 91–131 (41)More LessComputational Linguistics and Natural Language have long been targets for Machine Learning, and a variety of learning paradigms and techniques have been employed with varying degrees of success. In this paper, we review approaches which have adopted an unsupervised learning paradigm, explore the assumptions which underlie the techniques used, and develop an approach to empirical evaluation. We concentrate on a statistical framework based on N-grams, although we seek to maintain neurolinguistic plausibility.The model we adopt places putative linguistic units in focus and associates them with a characteristic vector of statistics derived from occurrence frequency. These vectors are treated as defining a hyperspace, within which we demonstrate a technique for examining the empirical utility of the various metrics and normalization, visualization, and clustering techniques proposed in the literature. We conclude with an evaluation of the relative utility of a large array of different metrics and processing techniques in relation to our defined performance criteria.
-
Social Differentiation in the Use of English Vocabulary: Some Analyses of the Conversational Component of the British National Corpus
Author(s): Paul Rayson, Geoffrey N. Leech and Mary Hodgespp.: 133–152 (20)More LessIn this article, we undertake selective quantitative analyses of the demographi-cally-sampled spoken English component of the British National Corpus (for brevity, referred to here as the ''Conversational Corpus"). This is a subcorpus of c. 4.5 million words, in which speakers and respondents (see I below) are identified by such factors as gender, age, social group, and geographical region. Using a corpus analysis tool developed at Lancaster, we undertake a comparison of the vocabulary of speakers, highlighting those differences which are marked by a very high X2 value of difference between different sectors of the corpus according to gender, age, and social group. A fourth variable, that of geographical region of the United Kingdom, is not investigated in this article, although it remains a promising subject for future research. (As background we also briefly examine differences between spoken and written material in the British National Corpus [BNC].) This study is illustrative of the potentiality of the Conversational Corpus for future corpus-based research on social differentiation in the use of language. There are evident limitations, including (a) the reliance on vocabulary frequency lists and (b) the simplicity of the transcription system employed for the spoken part of the BNC The conclusion of the article considers future advances in the research paradigm illustrated here.
-
Multilingual Natural Language Processing
Author(s): Gregory Grefenstette and Frédérique Segondpp.: 153–162 (10)More Less
-
Abstracts
Author(s): Hilde Hasselgård, Juhani Klemola, Susan Pintzuk and Merja Kytöpp.: 173–179 (7)More Less
Volumes & issues
-
Volume 29 (2024)
-
Volume 28 (2023)
-
Volume 27 (2022)
-
Volume 26 (2021)
-
Volume 25 (2020)
-
Volume 24 (2019)
-
Volume 23 (2018)
-
Volume 22 (2017)
-
Volume 21 (2016)
-
Volume 20 (2015)
-
Volume 19 (2014)
-
Volume 18 (2013)
-
Volume 17 (2012)
-
Volume 16 (2011)
-
Volume 15 (2010)
-
Volume 14 (2009)
-
Volume 13 (2008)
-
Volume 12 (2007)
-
Volume 11 (2006)
-
Volume 10 (2005)
-
Volume 9 (2004)
-
Volume 8 (2003)
-
Volume 7 (2002)
-
Volume 6 (2001)
-
Volume 5 (2000)
-
Volume 4 (1999)
-
Volume 3 (1998)
-
Volume 2 (1997)
-
Volume 1 (1996)
Most Read This Month
Article
content/journals/15699811
Journal
10
5
false

-
-
The Spoken BNC2014
Author(s): Robbie Love, Claire Dembry, Andrew Hardie, Vaclav Brezina and Tony McEnery
-
- More Less