- Home
- e-Journals
- Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication
- Previous Issues
- Volume 21, Issue, 2015
Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication - Volume 21, Issue 2, 2015
Volume 21, Issue 2, 2015
-
The underpinnings of a composite measure for automatic term extraction: The case of SRC
Author(s): Carlos Periñán-Pascualpp.: 151–179 (29)More LessThe corpus-based identification of those lexical units which serve to describe a given specialized domain usually becomes a complex task, where an analysis oriented to the frequency of words and the likelihood of lexical associations is often ineffective. The goal of this article is to demonstrate that a user-adjustable composite metric such as SRC can accommodate to the diversity of domain-specific glossaries to be constructed from small- and medium-sized specialized corpora of non-structured texts. Unlike for most of the research in automatic term extraction, where single metrics are usually combined indiscriminately to produce the best results, SRC is grounded on the theoretical principles of salience, relevance and cohesion, which have been rationally implemented in the three components of this metric.
-
Nested term recognition driven by word connection strength
Author(s): Malgorzata Marciniak and Agnieszka Mykowieckapp.: 180–204 (25)More LessDomain corpora are often not very voluminous and even important terms can occur in them not as isolated maximal phrases but only within more complex constructions. Appropriate recognition of nested terms can thus influence the content of the extracted candidate term list and its order. We propose a new method for identifying nested terms based on a combination of two aspects: grammatical correctness and normalised pointwise mutual information (NPMI) counted for all bigrams in a given corpus. NPMI is typically used for recognition of strong word connections, but in our solution we use it to recognise the weakest points to suggest the best place for division of a phrase into two parts. By creating, at most, two nested phrases in each step, we introduce a binary term structure. We test the impact of the proposed method applied, together with the C-value ranking method, to the automatic term recognition task performed on three corpora, two in Polish and one in English.
-
Extracting bilingual terms from the Web
Author(s): Robert Gaizauskas, Monica Lestari Paramita, Emma Barker, Marcis Pinnis, Ahmet Aker and Marta Pahisa Solépp.: 205–236 (32)More LessIn this paper we make two contributions. First, we describe a multi-component system called BiTES (Bilingual Term Extraction System) designed to automatically gather domain-specific bilingual term pairs from Web data. BiTES components consist of data gathering tools, domain classifiers, monolingual text extraction systems and bilingual term aligners. BiTES is readily extendable to new language pairs and has been successfully used to gather bilingual terminology for 24 language pairs, including English and all official EU languages, save Irish. Second, we describe a novel set of methods for evaluating the main components of BiTES and present the results of our evaluation for six language pairs. Results show that the BiTES approach can be used to successfully harvest quality bilingual term pairs from the Web. Our evaluation method delivers significant insights about the strengths and weaknesses of our techniques. It can be straightforwardly reused to evaluate other bilingual term extraction systems and makes a novel contribution to the study of how to evaluate bilingual terminology extraction systems.
-
The Sociopolitical Thesaurus as a resource for automatic document processing in Russian
Author(s): Natalia Loukachevitch and Boris Dobrovpp.: 237–262 (26)More LessThis paper presents the structure and current state of the Sociopolitical thesaurus, which was developed for automatic document analysis and information-retrieval applications in Russian in a broad domain of public affairs. The scope of the Sociopolitical thesaurus resembles traditional information-retrieval thesauri for broad domains such as the EUROVOC or UNBIS thesauri, but the Sociopolitical thesaurus is intended as a tool for automatic document processing and this difference leads to considerable distinctions in the thesaurus structure and principles of its development. The knowledge representation in the Sociopolitical thesaurus is based on the combination of three existing traditions of developing information-retrieval thesauri, wordnets, and formal ontology research, which facilitates the consistent representation for such a broad scope of concepts and automatic document analysis of unstructured texts. The Sociopolitical thesaurus is used in such applications as conceptual indexing in information-retrieval systems, knowledge-based text categorization, automatic summarization of single and multiple documents, and question-answering. This paper presents an evaluation of the Sociopolitical thesaurus in automatic knowledge-based text categorization.
-
Compositional translation of single-word complex terms using multilingual splitting
Author(s): Elizaveta Clouet, Rima Harastani, Béatrice Daille and Emmanuel Morinpp.: 263–291 (29)More LessMultilingual terminology acquisition from comparable corpora has been attracting the interest of researchers for twenty years, but challenges still remain. Bilingual term alignment, a subtask of multilingual terminology acquisition, requires a pre-processing step, because term structure may differ according to the language. Morphologically constructed terms should be segmented in order to be aligned with their equivalents in other languages. This article addresses the translation of complex terms using a compositional approach. We focus on the pre-processing of such terms and introduce a domain-oriented splitting method that we apply to compound terms belonging to two domains and four languages. The segmentations are used as input to a translation step. We evaluate which percentage of segmentations can be correctly translated by a compositional approach, and which splitting strategy (precision or recall-oriented) performs better. The results are compared to those obtained with the reference segmentations and with a corpus-base splitting method. Our method is close to the reference segmentation and outperforms the corpus-based method.
Volumes & issues
-
Volume 30 (2024)
-
Volume 29 (2023)
-
Volume 28 (2022)
-
Volume 27 (2021)
-
Volume 26 (2020)
-
Volume 25 (2019)
-
Volume 24 (2018)
-
Volume 23 (2017)
-
Volume 22 (2016)
-
Volume 21 (2015)
-
Volume 20 (2014)
-
Volume 19 (2013)
-
Volume 18 (2012)
-
Volume 17 (2011)
-
Volume 16 (2010)
-
Volume 15 (2009)
-
Volume 14 (2008)
-
Volume 13 (2007)
-
Volume 12 (2006)
-
Volume 11 (2005)
-
Volume 10 (2004)
-
Volume 9 (2003)
-
Volume 8 (2002)
-
Volume 7 (2001)
-
Volume 6 (2000)
-
Volume 5 (1998)
-
Volume 4 (1997)
-
Volume 3 (1996)
-
Volume 2 (1995)
-
Volume 1 (1994)
Most Read This Month
Article
content/journals/15699994
Journal
10
5
false
-
-
Methods of automatic term recognition: A review
Author(s): Kyo Kageura and Bin Umino
-
- More Less