1887
Text Corpora and Multilingual Lexicography
  • ISSN 1384-6655
  • E-ISSN: 1569-9811
USD
Buy:$35.00 + Taxes

Abstract

We present experimental results of an automatic extraction of a Czech-English translation dictionary. Two different bilingual corpora (119,886 sentence pairs computer-oriented and 58,137 journalistic corpora) were created. We used the length-based statistical method for sentence alignment (Gale and Church 1991) and noun phrase marker working with regular grammar and probabilistic model (Brown et al. 1993) for dictionary extraction. Resulting dictionaries’ size varies around 6,000 entries. After significance filtering, weighted precision is 86.4% for computer-oriented and 70.7% for journalistic Czech-English dictionary.

Loading

Article metrics loading...

/content/journals/10.1075/ijcl.6.si.02cme
2001-12-17
2024-09-17
Loading full text...

Full text loading...

/content/journals/10.1075/ijcl.6.si.02cme
Loading
  • Article Type: Research Article
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error