Automatic acquisition and classification of terminology using a tagged corpus in the molecular biology domain

Nigel Collier; Chikashi Nobata; Junichi Tsujii

doi:10.1075/term.7.2.07col

ISSN 0929-9971
E-ISSN: 1569-9994

GBP

Automatic acquisition and classification of terminology using a tagged corpus in the molecular biology domain
Author(s): Nigel Collier, Chikashi Nobata and Junichi Tsujii
Source: Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication, Volume 7, Issue 2, Jan 2001, p. 239 - 257
DOI: https://doi.org/10.1075/term.7.2.07col

Abstract

This article describes our work to identify and classify terms in the domain of molecular biology according to examples that have been marked up by a domain expert in a corpus of abstracts taken from a controlled search of the Medline database. Automatic acquisition of biomedical term lists has so far been slow due to high variability in both the terms and their classification scheme, which we attribute to the diversity of research disciplines involved. Nevertheless, the explosive growth in online molecular biology literature makes a persuasive case for automating many tasks. This includes acquisition of records for gene-product databases such as SwissProt which are currently updated by human experts, a task that is both time consuming and often highly idiosyncratic. In this article we report results from a tool based on a hidden-Markov model for extracting and classifying terms that can be used as a key component in an information extraction system. We discuss the results in light of lexical, syntactic and semantic properties of terms that were revealed by our study.

Article metrics loading...

/content/journals/10.1075/term.7.2.07col

2001-01-01

2024-04-19

From This Site

/content/journals/10.1075/term.7.2.07col

dcterms_title,dcterms_subject,pub_keyword

-contentType:Journal -contentType:Contributor -contentType:Concept -contentType:Institution

10

5

Full text loading...

http://instance.metastore.ingenta.com/content/journals/10.1075/term.7.2.07col

Article Type: Research Article

Keyword(s): information extraction; molecular biology; named entity

Automatic acquisition and classification of terminology using a tagged corpus in the molecular biology domain

Abstract

From This Site

Most Read This Month

Most Cited

Methods of automatic term recognition: A review

Term extraction using non-technical corpora as a point of leverage

Theories of terminology: Their description, prescription and explanation

Causes of denominative variation in terminology: A typology proposal

Process-oriented terminology management in the domain of Coastal Engineering

A corpus comparison approach for terminology extraction

Automatic term recognition based on statistics of compound nouns and their components

Automatic term recognition based on statistics of compound nouns

TExSIS: Bilingual terminology extraction from parallel corpora using chunk-based alignment

Variation in the organization of medical terms: Exploring some motivations for term choice