1887
Volume 5, Issue 1
  • ISSN 1384-6655
  • E-ISSN: 1569-9811
USD
Buy:$35.00 + Taxes

Abstract

Bilingual parallel corpora offer a treasure house of human translator’s knowledge of the correspondences between the two languages. Extracting by automatic means the translation equivalents deemed accurate and contextually appropriate by a human translator is of great practical importance for various fields such as example-based machine translation, computational lexicography, information retrieval, etc. The task of word or phrase level identification is greatly reduced if suitable anchor points can be found in the stream of texts. It is suggested that grammatical morphemes provide very useful clues to finding translation equivalents. They typically form a closed set, occur frequently enough in sentences, have more or less fixed meanings, and, most important, will stand in a one-to-one or at most one-to-few relationship with corresponding elements in the other language. This paper will explore the viability of the idea with reference to the Hungarian and English versions of Plato’s Republic, which are available in sentence-aligned form. Hungarian has a rich set of suffixes which are typically deployed in a concatenated manner. Corresponding to them in English are prepositions, auxiliary words, and suffixes. The paper will show how, by starting from a well defined set of correspondences between Hungarian grammatical morphemes and their equivalents and using a combination of pattern matching and heuristics, one can arrive at a mapping of phrases between the two texts.

Loading

Article metrics loading...

/content/journals/10.1075/ijcl.5.1.02var
2000-01-01
2025-04-24
Loading full text...

Full text loading...

/content/journals/10.1075/ijcl.5.1.02var
Loading
  • Article Type: Research Article
Keyword(s): multilingual alignment; parallel corpora
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error