Automatic taxonomy extraction for specialized domains using distributional semantics

Rogelio Nazar; Jorge Vivaldi; Leo Wanner

doi:10.1075/term.18.2.03naz

ISSN 0929-9971
E-ISSN: 1569-9994

GBP

Automatic taxonomy extraction for specialized domains using distributional semantics
Author(s): Rogelio Nazar, Jorge Vivaldi and Leo Wanner
Source: Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication, Volume 18, Issue 2, Jan 2012, p. 188 - 225
DOI: https://doi.org/10.1075/term.18.2.03naz

Abstract

This article explores a statistical, language-independent methodology for the construction of taxonomies of specialized domains from noisy corpora. In contrast to proposals that exploit linguistic information by searching for lexico-syntactic patterns that tend to express the hypernymy relation, our methodology relies entirely upon the distributional semantics of terms as captured by their lexical co-occurrence in large scale corpora. In a first stage, we analyze the syntagmatic relations of terms that serve as seeds of the taxonomy to be constructed and we obtain, thus, the first batch of hypernym candidate terms for our seed terms. In a second stage, we analyze the paradigmatic relations of the terms by inspecting which terms show a prominent frequency of co-occurrence with the terms that, as we found in the previous stage, are syntagmatically related to our seed terms — which allows us to refine the first batch of hypernym candidate terms and obtain new ones. In a third and final stage, we build a taxonomy from the obtained hypernym candidate lists, exploiting the asymmetric statistic association between terms that is characteristic of the hypernymy relation.

Article metrics loading...

/content/journals/10.1075/term.18.2.03naz

2012-01-01

2024-04-18

From This Site

/content/journals/10.1075/term.18.2.03naz

dcterms_title,dcterms_subject,pub_keyword

-contentType:Journal -contentType:Contributor -contentType:Concept -contentType:Institution

10

5

Full text loading...

http://instance.metastore.ingenta.com/content/journals/10.1075/term.18.2.03naz

Article Type: Research Article

Keyword(s): distributional semantics; quantitative linguistics; taxonomy extraction; terminology extraction

Automatic taxonomy extraction for specialized domains using distributional semantics

Abstract

From This Site

Most Read This Month

Most Cited

Methods of automatic term recognition: A review

Term extraction using non-technical corpora as a point of leverage

Theories of terminology: Their description, prescription and explanation

Causes of denominative variation in terminology: A typology proposal

Process-oriented terminology management in the domain of Coastal Engineering

A corpus comparison approach for terminology extraction

Automatic term recognition based on statistics of compound nouns and their components

Automatic term recognition based on statistics of compound nouns

TExSIS: Bilingual terminology extraction from parallel corpora using chunk-based alignment

Variation in the organization of medical terms: Exploring some motivations for term choice