
Full text loading...
Abstract
Over the past decades, automatic term or terminology extraction (ATE), a natural language processing (NLP) task that aims to identify terms from specific domains by providing a list of candidate terms, has been challenging due to the strong influence of domain-specific differences on term definitions. Leveraging the advances of large-scale language models (LLMs), we propose LlamATE, a framework to verify the impact of domain specificity on ATE when using in-context learning prompts in open-sourced LLM-based chat models, namely Llama-2-Chat. We evaluate how well the LLM-based chat (e.g., using reinforcement learning with human feedback (RLHF)) models perform with different levels of domain-related information in the dominant language in NLP research (e.g., English) and other European languages (e.g., French, Slovene) from ACTER datasets, i.e., in-domain and cross-domain demonstrations with and without domain enunciation. Furthermore, we examine the potential of cross-lingual and cross-domain prompting to reduce the need for extensive data annotation of the target domain and language. The results demonstrate the potential of implicit in-domain learning where examples of the target domain are used as demonstrations for the prompts without specifying the domain of each example, and cross-lingual learning when knowledge is transferred from the dominant to lesser-represented European languages as for the data used to pre-train the LLMs. LlamATE also offers a valuable compromise by reducing the need for extensive data annotation, making it suitable for real-world applications where labeled corpora are scarce. The source code is publicly available at the following link: https://github.com/honghanhh/terminology2024.
Article metrics loading...
Full text loading...
References
Data & Media loading...