1887
Terminology across Languages and Domains
  • ISSN 0929-9971
  • E-ISSN: 1569-9994

Abstract

Domain corpora are often not very voluminous and even important terms can occur in them not as isolated maximal phrases but only within more complex constructions. Appropriate recognition of nested terms can thus influence the content of the extracted candidate term list and its order. We propose a new method for identifying nested terms based on a combination of two aspects: grammatical correctness and normalised pointwise mutual information (NPMI) counted for all bigrams in a given corpus. NPMI is typically used for recognition of strong word connections, but in our solution we use it to recognise the weakest points to suggest the best place for division of a phrase into two parts. By creating, at most, two nested phrases in each step, we introduce a binary term structure. We test the impact of the proposed method applied, together with the C-value ranking method, to the automatic term recognition task performed on three corpora, two in Polish and one in English.

Loading

Article metrics loading...

/content/journals/10.1075/term.21.2.03mar
2015-01-01
2018-12-19
Loading full text...

Full text loading...

References

http://instance.metastore.ingenta.com/content/journals/10.1075/term.21.2.03mar
Loading
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error