1887
Japanese Term Extraction
  • ISSN 0929-9971
  • E-ISSN: 1569-9994

Abstract

We present a system used in the term recognition competition, one of the subtasks covered by the NTCIR tmrec group, and we evaluate its term recognition results. We regard that terms are lexical items, characteristic of a field, which have the following three features: (1) they appear frequently in documents of the target field; (2) they are not common words in the target field; and (3) they appear less frequently in the corpora of other fields. Our system uses corpora from different fields and uses these features to recognize terms.We then analyze the differences between our term list and the manual candidates list produced by the NTCIR tmrec group. In this article we identify features that are important for automatic term recognition. Furthermore, through comparative experiments based on manual candidates, we establish the importance of indices in extracting a term list.

Loading

Article metrics loading...

/content/journals/10.1075/term.6.2.07uch
2000-01-01
2024-09-09
Loading full text...

Full text loading...

/content/journals/10.1075/term.6.2.07uch
Loading
  • Article Type: Research Article
Keyword(s): corpora; document frequency; field; term frequency; term recognition
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error