1887
Volume 7, Issue 1
  • ISSN 1384-6655
  • E-ISSN: 1569-9811
USD
Buy:$35.00 + Taxes

Abstract

The paper presents an annotated parallel Slovene-English corpus developed in the scope of the EU ELAN project. The IJS-ELAN corpus was compiled to be a widely distributable dataset for language engineering and for translation and terminology studies. The corpus contains 1 million words from fifteen recent terminology-rich texts. The corpus is sentence aligned and word-tagged with context disambiguated morphosyntactic descriptions and lemmas. These descriptions model simple feature structures, the structure of which is shared between Slovene and English. The corpus is encoded according to the Guidelines for Text Encoding and Interchange and is freely available on the Web for downloading. Additionally, access to IJS-ELAN is available via a powerful Web concordancer.

Loading

Article metrics loading...

/content/journals/10.1075/ijcl.7.1.01erj
2002-01-01
2024-11-10
Loading full text...

Full text loading...

/content/journals/10.1075/ijcl.7.1.01erj
Loading
  • Article Type: Research Article
Keyword(s): concordancing; corpus encoding; parallel corpus; tagging
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error