Volume 10, Issue 2
  • ISSN 1384-6655
  • E-ISSN: 1569-9811
Buy:$35.00 + Taxes


Recently, corpus comparison has been used by a number of researchers for extracting single-word terms (SWTs) from specialized corpora. It is viewed as a means to supplement multi-word term (MWT) extraction, the focus of which is on noun phrases. However, little is known about the value of this technique in a terminological setting. This paper examines two different methods for finding French SWTs in the field of computing. The first one (M1) compares the specialized corpus to a corpus considered to be a reflection of language as a whole. The second one (M2) breaks down the specialized corpus into six topical subcorpora that are compared in turn to the entire specialized corpus. The calculation relies on standard normal distribution and is carried out by a program calledTermoStat. The specific units produced by both methods are then evaluated by comparing them to the contents of two specialized dictionaries. We also compare the results yielded by the two methods. Results show that precision is fair (approximately 50%of units extracted by both methods can be found in specialized dictionaries). However, recall is lower in both methods. Results also reveal that, even though M1 yields better results that M2, both methods are useful for identifying SWTs and should be considered in terminological work.


Article metrics loading...

Loading full text...

Full text loading...

  • Article Type: Research Article
Keyword(s): corpus comparison; single-word term; specialized corpora; term-extraction; terminology
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error