Volume 7, Issue 2
  • ISSN 0929-9971
  • E-ISSN: 1569-9994
Buy:$35.00 + Taxes


This paper focuses on the improvement of statistically-extracted phrase lists by applying word alignment approaches to bitext. Such phrase lists serve several tasks such as the compilation of terminology or translation databases. Our investigations are based on the assumption that word alignment favors well-formed phrase structures rather than irregular text segments. If this is the case, word alignment will filter out irregular structures from automatically generated phrase lists. As a result, an improved phrase list, in terms of precision, may be compiled. Furthermore, word alignment approaches can be used to identify additional multi-word units, e.g. multi-word cognates. Our investigations are focused on a Swedish/English text corpus that has been aligned with the Uppsala Word Aligner (UWA). Finally, we describe and apply three approaches to evaluate the automatically generated phrase lists: an evaluation by comparing results with existing reference data (prior reference), an evaluation against given syntactic patterns (prior reference patterns), and a manual evaluation of sample data (posterior reference). The evaluations of the extraction of phrasal terms in English substantiate the assumption: precision has improved significantly with little loss in recall.


Article metrics loading...

Loading full text...

Full text loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error