- Home
- e-Journals
- International Journal of Corpus Linguistics
- Previous Issues
- Volume 5, Issue, 2000
International Journal of Corpus Linguistics - Volume 5, Issue 1, 2000
Volume 5, Issue 1, 2000
-
Fishing for Translation Equivalents Using Grammatical Anchors
Author(s): Tamás Váradipp.: 1–16 (16)More LessBilingual parallel corpora offer a treasure house of human translator’s knowledge of the correspondences between the two languages. Extracting by automatic means the translation equivalents deemed accurate and contextually appropriate by a human translator is of great practical importance for various fields such as example-based machine translation, computational lexicography, information retrieval, etc. The task of word or phrase level identification is greatly reduced if suitable anchor points can be found in the stream of texts. It is suggested that grammatical morphemes provide very useful clues to finding translation equivalents. They typically form a closed set, occur frequently enough in sentences, have more or less fixed meanings, and, most important, will stand in a one-to-one or at most one-to-few relationship with corresponding elements in the other language. This paper will explore the viability of the idea with reference to the Hungarian and English versions of Plato’s Republic, which are available in sentence-aligned form. Hungarian has a rich set of suffixes which are typically deployed in a concatenated manner. Corresponding to them in English are prepositions, auxiliary words, and suffixes. The paper will show how, by starting from a well defined set of correspondences between Hungarian grammatical morphemes and their equivalents and using a combination of pattern matching and heuristics, one can arrive at a mapping of phrases between the two texts.
-
Towards a Methodology for Exploiting Specialized Target Language Corpora as Translation Resources
Author(s): Lynne Bowkerpp.: 17–52 (36)More LessSpecialized target language (TL) corpora constitute an extremely valuable resource for translators, and although no specialized tools have been developed for extracting translation data from such corpora, this paper argues that translators would be remiss not to consult such resources. We describe the advantages of using specialized TL corpora and outline a number of techniques that translators can use in order to extract translation data from such corpora with the aid of generic corpus analysis tools. These advantages and techniques are demonstrated with reference to two translations, one of which was done using only conventional resources and the other with the help of a corpus.
-
A Proposal for Improving the Measurement of Parse Accuracy
Author(s): Geoffrey Sampsonpp.: 53–68 (16)More LessWidespread dissatisfaction has been expressed with the measure of parse accuracy used in the Parseval programme, based on the location of constituent boundaries. Scores on the Parseval metric are perceived as poorly correlated with intuitive judgments of goodness of parse; the metric applies only to a restricted range of grammar formalisms; and it is seen as divorced from applications of NLP technology. The present paper defines an alternative metric, which measures the accuracy with which successive words are fitted into parsetrees. (The original statement of this metric is believed to have been the earliest published proposal about quantifying parse accuracy.) The metric defined here gives overall scores that quantify intuitive concepts of good and bad parsing relatively directly, and it gives scores for individual words which enable the location of parsing errors to be pinpointed. It applies to a wider range of grammar formalisms, and is tunable for specific parsing applications.
-
Grammatical Tagging of a Persian Corpus
Author(s): S. Mostafa Assipp.: 69–81 (13)More LessThe purpose of this article is to briefly introduce an interactive POS tagging system developed as a project at the Institute for Humanities and Cultural Studies in Tehran, Iran. The system is designed as part of the annotation procedure for a Persian corpus called The Farsi Linguistic Database (FLDB) (a project at the Institute for Humanities and Cultural Studies in Tehran which comprises a selection of contemporary Modern Persian literature, formal and informal spoken varieties of the language, and a series of dictionary entries and word lists [Assi 1997: 5]) and is the first attempt ever to tag a Persian corpus. In Section 1, the project itself will be introduced; Section 2 presents an evaluation of the project, and Section 3 is allocated to some suggestions for future work.
-
Co-occurrence Tendencies of Loanwords in Corpora
Author(s): Petek Kurtböke and Liz Potterpp.: 83–100 (18)More LessThis paper investigates some major approaches to the analysis of foreign material in text, commonly known as loanwords. While the nature of data may differ in various fields of linguistic research (e.g., bilingual vs. monolingual corpora), perspectives on the analysis of such material have not been different, and they have traditionally been analysed as singly-occurring items out of context. However, corpus research has shown that words rarely occur in isolation. On the basis of a number of English loans in a corpus of Turkish compiled in a multilingual setting, and a number of Italian loans in a corpus of English compiled in a monolingual setting, we conclude that collocational patterns growing around loanwords are significant and should be included in the treatment of loanwords.
Volumes & issues
-
Volume 29 (2024)
-
Volume 28 (2023)
-
Volume 27 (2022)
-
Volume 26 (2021)
-
Volume 25 (2020)
-
Volume 24 (2019)
-
Volume 23 (2018)
-
Volume 22 (2017)
-
Volume 21 (2016)
-
Volume 20 (2015)
-
Volume 19 (2014)
-
Volume 18 (2013)
-
Volume 17 (2012)
-
Volume 16 (2011)
-
Volume 15 (2010)
-
Volume 14 (2009)
-
Volume 13 (2008)
-
Volume 12 (2007)
-
Volume 11 (2006)
-
Volume 10 (2005)
-
Volume 9 (2004)
-
Volume 8 (2003)
-
Volume 7 (2002)
-
Volume 6 (2001)
-
Volume 5 (2000)
-
Volume 4 (1999)
-
Volume 3 (1998)
-
Volume 2 (1997)
-
Volume 1 (1996)
Most Read This Month
Article
content/journals/15699811
Journal
10
5
false

-
-
The Spoken BNC2014
Author(s): Robbie Love, Claire Dembry, Andrew Hardie, Vaclav Brezina and Tony McEnery
-
- More Less