
Full text loading...
Narrative clinical records and biomedical articles constitute rich sources of information about phenotypes, i.e., markers distinguishing individuals with specific medical conditions from the general population. Phenotypes help clinicians to provide personalised treatments. However, locating information about them within huge document repositories is difficult, since each phenotypic concept can be mentioned in many ways. Normalisation methods automatically map divergent phrases to unique concepts in domain-specific terminologies, to allow location and linking of all mentions of a concept of interest. We have developed a hybrid normalisation method (HYPHEN) to handle concept mentions with wide ranging characteristics, across different text types. HYPHEN integrates various normalisation techniques that handle surface-level variations (e.g., differences in word order, word forms or acronyms/abbreviations) and lexical-level variations (where terms have similar meanings, but potentially unrelated forms). HYPHEN achieves robust performance for both biomedical academic text and narrative clinical records, and has the ability to significantly outperform related methods.
Article metrics loading...
Full text loading...
References
Data & Media loading...