Volume 5, Issue 1

International Journal of Corpus Linguistics - Volume 5, Issue 1, 2000

Volume 5, Issue 1, 2000

EDITORIAL

pp.: iii–iii

https://doi.org/10.1075/ijcl.5.1.01edi
More Less
Add to my favourites

Email this

Fishing for Translation Equivalents Using Grammatical Anchors

Author(s): Tamás Váradi

pp.: 1–16 (16)

https://doi.org/10.1075/ijcl.5.1.02var
More Less
Bilingual parallel corpora offer a treasure house of human translator’s knowledge of the correspondences between the two languages. Extracting by automatic means the translation equivalents deemed accurate and contextually appropriate by a human translator is of great practical importance for various fields such as example-based machine translation, computational lexicography, information retrieval, etc. The task of word or phrase level identification is greatly reduced if suitable anchor points can be found in the stream of texts. It is suggested that grammatical morphemes provide very useful clues to finding translation equivalents. They typically form a closed set, occur frequently enough in sentences, have more or less fixed meanings, and, most important, will stand in a one-to-one or at most one-to-few relationship with corresponding elements in the other language. This paper will explore the viability of the idea with reference to the Hungarian and English versions of Plato’s Republic, which are available in sentence-aligned form. Hungarian has a rich set of suffixes which are typically deployed in a concatenated manner. Corresponding to them in English are prepositions, auxiliary words, and suffixes. The paper will show how, by starting from a well defined set of correspondences between Hungarian grammatical morphemes and their equivalents and using a combination of pattern matching and heuristics, one can arrive at a mapping of phrases between the two texts.
Add to my favourites

Email this

Towards a Methodology for Exploiting Specialized Target Language Corpora as Translation Resources

Author(s): Lynne Bowker

pp.: 17–52 (36)

https://doi.org/10.1075/ijcl.5.1.03bow
More Less
Specialized target language (TL) corpora constitute an extremely valuable resource for translators, and although no specialized tools have been developed for extracting translation data from such corpora, this paper argues that translators would be remiss not to consult such resources. We describe the advantages of using specialized TL corpora and outline a number of techniques that translators can use in order to extract translation data from such corpora with the aid of generic corpus analysis tools. These advantages and techniques are demonstrated with reference to two translations, one of which was done using only conventional resources and the other with the help of a corpus.
Add to my favourites

Email this

A Proposal for Improving the Measurement of Parse Accuracy

Author(s): Geoffrey Sampson

pp.: 53–68 (16)

https://doi.org/10.1075/ijcl.5.1.04sam
More Less
Widespread dissatisfaction has been expressed with the measure of parse accuracy used in the Parseval programme, based on the location of constituent boundaries. Scores on the Parseval metric are perceived as poorly correlated with intuitive judgments of goodness of parse; the metric applies only to a restricted range of grammar formalisms; and it is seen as divorced from applications of NLP technology. The present paper defines an alternative metric, which measures the accuracy with which successive words are fitted into parsetrees. (The original statement of this metric is believed to have been the earliest published proposal about quantifying parse accuracy.) The metric defined here gives overall scores that quantify intuitive concepts of good and bad parsing relatively directly, and it gives scores for individual words which enable the location of parsing errors to be pinpointed. It applies to a wider range of grammar formalisms, and is tunable for specific parsing applications.
Add to my favourites

Email this

Grammatical Tagging of a Persian Corpus

Author(s): S. Mostafa Assi

pp.: 69–81 (13)

https://doi.org/10.1075/ijcl.5.1.05ass
More Less
The purpose of this article is to briefly introduce an interactive POS tagging system developed as a project at the Institute for Humanities and Cultural Studies in Tehran, Iran. The system is designed as part of the annotation procedure for a Persian corpus called The Farsi Linguistic Database (FLDB) (a project at the Institute for Humanities and Cultural Studies in Tehran which comprises a selection of contemporary Modern Persian literature, formal and informal spoken varieties of the language, and a series of dictionary entries and word lists [Assi 1997: 5]) and is the first attempt ever to tag a Persian corpus. In Section 1, the project itself will be introduced; Section 2 presents an evaluation of the project, and Section 3 is allocated to some suggestions for future work.
Add to my favourites

Email this

Co-occurrence Tendencies of Loanwords in Corpora

Author(s): Petek Kurtböke and Liz Potter

pp.: 83–100 (18)

https://doi.org/10.1075/ijcl.5.1.06kur
More Less
This paper investigates some major approaches to the analysis of foreign material in text, commonly known as loanwords. While the nature of data may differ in various fields of linguistic research (e.g., bilingual vs. monolingual corpora), perspectives on the analysis of such material have not been different, and they have traditionally been analysed as singly-occurring items out of context. However, corpus research has shown that words rarely occur in isolation. On the basis of a number of English loans in a corpus of Turkish compiled in a multilingual setting, and a number of Italian loans in a corpus of English compiled in a monolingual setting, we conclude that collocational patterns growing around loanwords are significant and should be included in the treatment of loanwords.
Add to my favourites

Email this

Reviews

pp.: 101–116 (16)

https://doi.org/10.1075/ijcl.5.1.07rev
More Less
Add to my favourites

Email this

Abstracts

pp.: 117–120 (4)

https://doi.org/10.1075/ijcl.5.1.08abs
More Less
Add to my favourites

Email this

Most Cited

- Collostructions: Investigating the interaction of words and constructions
  
  Author(s): Anatol Stefanowitsch and Stefan Th. Gries
- Automatic analysis of syntactic complexity in second language writing
  
  Author(s): Xiaofei Lu
- Extending collostructional analysis: A corpus-based perspective on `alternations'
  
  Author(s): Stefan Th. Gries and Anatol Stefanowitsch
- From key words to key semantic domains
  
  Author(s): Paul Rayson
- The 385+ million word Corpus of Contemporary American English (1990–2008+): Design, architecture, and linguistic insights
  
  Author(s): Mark Davies
- A corpus-driven approach to formulaic language in English
  
  Author(s): Douglas Biber
- Collocations in context: A new perspective on collocation networks
  
  Author(s): Vaclav Brezina, Tony McEnery and Stephen Wattam
- CQPweb — combining power, flexibility and usability in a corpus analysis tool
  
  Author(s): Andrew Hardie
- Dispersions and adjusted frequencies in corpora
  
  Author(s): Stefan Th. Gries
- Comparing Corpora
  
  Author(s): Adam Kilgarriff
More Less

International Journal of Corpus Linguistics - Volume 5, Issue 1, 2000

Volume 5, Issue 1, 2000

Volumes & issues

Most Read This Month

Most Cited