1887
Volume 3, Issue 1
  • ISSN 1384-6655
  • E-ISSN: 1569-9811
USD
Buy:$35.00 + Taxes

Abstract

We describe and experimentally evaluate an alternative algorithm for aligning and extracting vocabulary from parallel texts using recency vectors and a similarity measure based on Levenshtein distance. The work is largely inspired by Fung and McKeown 's DK-vec, though we use a simpler algorithm. The technique is tested on two sets of parallel corpora involving English, French, German, Dutch, Spanish, and Japanese. We attempt to evaluate the importance of parameters such as frequency of words chosen as candidates, the effect of different language pairings, and differences between the two corpora.
Loading

Article metrics loading...

/content/journals/10.1075/ijcl.3.1.06som
1998-01-01
2019-12-12
Loading full text...

Full text loading...

References

http://instance.metastore.ingenta.com/content/journals/10.1075/ijcl.3.1.06som
Loading
  • Article Type: Research Article
Keyword(s): Levenshtein Distance , Parallel Corpora , Text Alignment , Vocabulary Estimation and Word Alignment
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error