
Full text loading...
Although intended for the “average layman”, both in terms of readability and contents, the current patient information still contains many scientific terms. Different studies have concluded that the use of scientific terminology is one of the factors, which greatly influences the readability of this patient information. The present study deals with the problem of automatic term recognition of overly scientific terminology as a first step towards the replacement of the recognized scientific terms by their popular counterpart. In order to do so, we experimented with two approaches, a dictionary-based approach and a learning-based approach, which is trained on a rich feature vector. The research was conducted on a bilingual corpus of English and Dutch EPARs (European Public Assessment Report). Our results show that we can extract scientific terms with a high accuracy (> 80%, 10% below human performance) for both languages. Furthermore, we show that a lexicon-independent approach, which solely relies on orthographical and morphological information is the most powerful predictor of the scientific character of a given term.