1887
Volume 31, Issue 1
  • ISSN 0929-9971
  • E-ISSN: 1569-9994
USD
Buy:$35.00 + Taxes

Abstract

Abstract

Over the past decades, automatic term or terminology extraction (ATE), a natural language processing (NLP) task that aims to identify terms from specific domains by providing a list of candidate terms, has been challenging due to the strong influence of domain-specific differences on term definitions. Leveraging the advances of large-scale language models (LLMs), we propose , a framework to verify the impact of domain specificity on ATE when using in-context learning prompts in open-sourced LLM-based chat models, namely . We evaluate how well the LLM-based chat (e.g., using reinforcement learning with human feedback (RLHF)) models perform with different levels of domain-related information in the dominant language in NLP research (e.g., English) and other European languages (e.g., French, Slovene) from ACTER datasets, i.e., in-domain and cross-domain demonstrations with and without domain enunciation. Furthermore, we examine the potential of cross-lingual and cross-domain prompting to reduce the need for extensive data annotation of the target domain and language. The results demonstrate the potential of implicit in-domain learning where examples of the target domain are used as demonstrations for the prompts without specifying the domain of each example, and cross-lingual learning when knowledge is transferred from the dominant to lesser-represented European languages as for the data used to pre-train the LLMs. also offers a valuable compromise by reducing the need for extensive data annotation, making it suitable for real-world applications where labeled corpora are scarce. The source code is publicly available at the following link: https://github.com/honghanhh/terminology2024.

Loading

Article metrics loading...

/content/journals/10.1075/term.00082.tra
2025-05-23
2025-06-24
Loading full text...

Full text loading...

References

  1. Astrakhantsev, Nikita A., Denis G. Fedorenko, and D. Yu. Turdakov
    2015 “Methods for Automatic Term Recognition in Domain-Specific Text Collections: A Survey.” Programming and Computer Software41 (6): 336–49. 10.1134/S036176881506002X
    https://doi.org/10.1134/S036176881506002X [Google Scholar]
  2. Azé, Jérôme, Mathieu Roche, Yves Kodratoff, and Michèle Sebag
    2005 “Preference Learning in Terminology Extraction: A ROC-Based Approach.” arXiv preprintcs/0512050.
    [Google Scholar]
  3. Bay, Matthias, Daniel Bruneß, Miriam Herold, Christian Schulze, Michael Guckert, and Mirjam Minor
    2021 “Term Extraction from Medical Documents Using Word Embeddings.” In2020 6th IEEE CiSt, 328–33. IEEE.
    [Google Scholar]
  4. Biemann, Chris, and Alexander Mehler
    2014Text Mining: From Ontology Learning to Automated Text Processing Applications. Springer. 10.1007/978‑3‑319‑12655‑5
    https://doi.org/10.1007/978-3-319-12655-5 [Google Scholar]
  5. Bolshakova, Elena, Natalia Loukachevitch, and Michael Nokel
    2013 “Topic Models Can Improve Domain Term Extraction.” InEuropean Conference on Information Retrieval, 684–87. Springer. 10.1007/978‑3‑642‑36973‑5_60
    https://doi.org/10.1007/978-3-642-36973-5_60 [Google Scholar]
  6. Cabré Castellví, M. Teresa, Rosa Estopa Bagot, and Jordi Vivaldi Palatresi
    2001 “Automatic Term Detection: A Review of Current Systems.” Recent Advances in Computational Terminology21: 53–88. 10.1075/nlp.2.04cab
    https://doi.org/10.1075/nlp.2.04cab [Google Scholar]
  7. Conrado, Merley, Thiago Pardo, and Solange Rezende
    2013 “A Machine Learning Approach to Automatic Term Extraction Using a Rich Feature Set.” InProceedings of the 2013 NAACL HLT Student Research Workshop, 16–22. Atlanta, Georgia, June 2013 Association for Computational Linguistics. https://aclanthology.org/N13-2003
    [Google Scholar]
  8. Conrado, Merley da Silva, Ariani Di Felippo, Thiago Alexandre Salgueiro Pardo, and Solange Oliveira Rezende
    2014 “A Survey of Automatic Term Extraction for Brazilian Portuguese.” Journal of the Brazilian Computer Society20 (1): 1–28.
    [Google Scholar]
  9. Daille, Béatrice, Éric Gaussier, and Jean-Marc Langé
    1994 “Towards Automatic Extraction of Monolingual and Bilingual Terminology.” InCOLING 1994 Volume 1: The 15th International Conference on Computational Linguistics. 10.3115/991886.991975
    https://doi.org/10.3115/991886.991975 [Google Scholar]
  10. Delaunay, Julien, Hanh Thi Hong Tran, Carlos-Emiliano González-Gallardo, Georgeta Bordea, Mathilde Ducos, Nicolas Sidere, Antoine Doucet, Senja Pollak, and Olivier De Viron
    2024 “CoastTerm: A Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature.” InInternational Conference on Text, Speech, and Dialogue, 97–109. Springer. 10.1007/978‑3‑031‑70563‑2_8
    https://doi.org/10.1007/978-3-031-70563-2_8 [Google Scholar]
  11. Dettmers, Tim, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer
    2024 “QLoRA: Efficient Finetuning of Quantized LLMs.” Advances in Neural Information Processing Systems361.
    [Google Scholar]
  12. Ding, Ning, Guangwei Xu, Yulin Chen, Xiaobin Wang, Xu Han, Pengjun Xie, Hai-Tao Zheng, and Zhiyuan Liu
    2021 “Few-NERD: A Few-Shot Named Entity Recognition Dataset.” arXiv preprintarXiv:2105.07464. 10.18653/v1/2021.acl‑long.248
    https://doi.org/10.18653/v1/2021.acl-long.248 [Google Scholar]
  13. Drouin, Patrick
    2003 “Term Extraction Using Non-Technical Corpora as a Point of Leverage.” Terminology9 (1): 99–115. 10.1075/term.9.1.06dro
    https://doi.org/10.1075/term.9.1.06dro [Google Scholar]
  14. El-Kishky, Ahmed, Yanglei Song, Chi Wang, Clare R. Voss, and Jiawei Han
    2014 “Scalable Topical Phrase Mining from Text Corpora.” Proceedings of the VLDB Endowment8 (3): 305–16. 10.14778/2735508.2735519
    https://doi.org/10.14778/2735508.2735519 [Google Scholar]
  15. Fedorenko, Denis, N. Astrakhantsev, and D. Turdakov
    2014 “Automatic Recognition of Domain-Specific Terms: An Experimental Evaluation.” Proceedings of the Institute for System Programming26 (4): 55–72. 10.15514/ISPRAS‑2014‑26(4)‑5
    https://doi.org/10.15514/ISPRAS-2014-26(4)-5 [Google Scholar]
  16. Foo, Jody, and Magnus Merkel
    2010 “Using Machine Learning to Perform Automatic Term Recognition.” InLREC 2010 Workshop on Methods for Automatic Acquisition of Language Resources and Their Evaluation Methods, 23 May 2010, Valletta, Malta, 49–54. European Language Resources Association.
    [Google Scholar]
  17. Frantzi, Katerina T., Sophia Ananiadou, and Junichi Tsujii
    1998 “The C-Value/NC-Value Method of Automatic Recognition for Multi-Word Terms.” InInternational Conference on Theory and Practice of Digital Libraries, 585–604. Springer. 10.1007/3‑540‑49653‑X_35
    https://doi.org/10.1007/3-540-49653-X_35 [Google Scholar]
  18. Guo, Biyang, Xin Zhang, Ziyuan Wang, Minqi Jiang, Jinran Nie, Yuxuan Ding, Jianwei Yue, and Yupeng Wu
    2023 “How Close Is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection.”
    [Google Scholar]
  19. Han, Xiaowei, Lizhen Xu, and Feng Qiao
    2018 “CNN-BiLSTM-CRF Model for Term Extraction in Chinese Corpus.” InInternational Conference on Web Information Systems and Applications, 267–74. Springer. 10.1007/978‑3‑030‑02934‑0_25
    https://doi.org/10.1007/978-3-030-02934-0_25 [Google Scholar]
  20. Hazem, Amir, Mérieme Bouhandi, Florian Boudin, and Béatrice Daille
    2020 “TermEval 2020: TALN-LS2N System for Automatic Term Extraction.” InProceedings of the 6th International Workshop on Computational Terminology, 95–100.
    [Google Scholar]
  21. 2022 “Cross-Lingual and Cross-Domain Transfer Learning for Automatic Term Extraction from Low-Resource Data.” InProceedings of the Thirteenth Language Resources and Evaluation Conference, 648–662.
    [Google Scholar]
  22. ISO
    ISO 2019Terminology Work and Terminology Science–Vocabulary. ISO 1087.
    [Google Scholar]
  23. Judea, Alex, Hinrich Schütze, and Sören Brügmann
    2014 “Unsupervised Training Set Generation for Automatic Acquisition of Technical Terminology in Patents.” InProceedings of COLING 2014, The 25th International Conference on Computational Linguistics: Technical Papers, 290–300.
    [Google Scholar]
  24. Kageura, Kyo, and Bin Umino
    1996 “Methods of Automatic Term Recognition: A Review.” Terminology: International Journal of Theoretical and Applied Issues in Specialized Communication3 (2): 259–89. 10.1075/term.3.2.03kag
    https://doi.org/10.1075/term.3.2.03kag [Google Scholar]
  25. Karan, Mladen, Jan Šnajder, and Bojana Dalbelo Bašić
    2012 “Evaluation of Classification Algorithms and Features for Collocation Extraction in Croatian.” InProceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), 657–62.
    [Google Scholar]
  26. Kocón, Jan, Igor Cichecki, Oliwier Kaszyca, Mateusz Kochanek, Dominika Szydło, Joanna Baran, Julita Bielaniewicz, Marcin Gruza, Arkadiusz Janz, Kamil Kanclerz, Anna Kocón, Bartłomiej Koptyra, Wiktoria Mieleszczenko-Kowszewicz, Piotr Miłkowski, Marcin Oleksy, Maciej Piasecki, Łukasz Radlínski, Konrad Wojtasik, Stanisław Wóźniak, and Przemysław Kazienko
    2023 “ChatGPT: Jack of All Trades, Master of None.”
    [Google Scholar]
  27. Kucza, Maren, Jan Niehues, Thomas Zenkel, Alex Waibel, and Sebastian Stüker
    2018 “Term Extraction via Neural Sequence Labeling: A Comparative Evaluation of Strategies Using Recurrent Neural Networks.” InINTERSPEECH, 2072–76. 10.21437/Interspeech.2018‑2017
    https://doi.org/10.21437/Interspeech.2018-2017 [Google Scholar]
  28. Lang, Christian, Lennart Wachowiak, Barbara Heinisch, and Dagmar Gromann
    2021 “Transforming Term Extraction: Transformer-Based Approaches to Multilingual Term Extraction Across Domains.” InFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 3607–20. 10.18653/v1/2021.findings‑acl.316
    https://doi.org/10.18653/v1/2021.findings-acl.316 [Google Scholar]
  29. Le, Ngoc Tan, and Fatiha Sadat
    2021 “Multilingual Automatic Term Extraction in Low-Resource Domains.” InThe International FLAIRS Conference Proceedings341. 10.32473/flairs.v34i1.128502
    https://doi.org/10.32473/flairs.v34i1.128502 [Google Scholar]
  30. Litvak, Marina, and Mark Last
    2008 “Graph-Based Keyword Extraction for Single-Document Summarization.” InColing 2008: Proceedings of the Workshop Multi-Source Multilingual Information Extraction and Summarization, 17–24. 10.3115/1613172.1613178
    https://doi.org/10.3115/1613172.1613178 [Google Scholar]
  31. Ljubešić, Nikola, Tomaž Erjavec, and Darja Fišer
    2018 “KAS-Term and KAS-Biterm: Datasets and Baselines for Monolingual and Bilingual Terminology Extraction from Academic Writing.” Digital Humanities71.
    [Google Scholar]
  32. Maldonado, Alfredo, and David Lewis
    2016 “Self-Tuning Ongoing Terminology Extraction Retrained on Terminology Validation Decisions.” InProceedings of The 12th International Conference on Terminology and Knowledge Engineering, 91–100.
    [Google Scholar]
  33. Nugumanova, Aliya, Darkhan Akhmed-Zaki, Madina Mansurova, Yerzhan Baiburin, and Almasbek Maulit
    2022 “NMF-Based Approach to Automatic Term Extraction.” Expert Systems with Applications1991: 117179. 10.1016/j.eswa.2022.117179
    https://doi.org/10.1016/j.eswa.2022.117179 [Google Scholar]
  34. Pavlopoulos, John, and Ion Androutsopoulos
    2014 “Aspect Term Extraction for Sentiment Analysis: New Datasets, New Evaluation Measures and an Improved Unsupervised Method.” InProceedings of the 5th Workshop on Language Analysis for Social Media (LASM), 44–52. 10.3115/v1/W14‑1306
    https://doi.org/10.3115/v1/W14-1306 [Google Scholar]
  35. Qasemizadeh, Behrang, and Siegfried Handschuh
    2014 “Evaluation of Technology Term Recognition with Random Indexing.” InProceedings of the Ninth International Conference on Language Resources and Evaluation.
    [Google Scholar]
  36. Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al
    2019 “Language Models Are Unsupervised Multitask Learners.” OpenAI Blog1 (8): 9.
    [Google Scholar]
  37. Repar, Andraz, Vid Podpečan, Anže Vavpetič, Nada Lavrač, and Senja Pollak
    2019 “TermEnsembler: An Ensemble Learning Approach to Bilingual Term Extraction and Alignment.” Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication25 (1): 93–120.
    [Google Scholar]
  38. Rigouts Terryn, Ayla, Veronique Hoste, Patrick Drouin, and Els Lefever
    2020 “TermEval 2020: Shared Task on Automatic Term Extraction Using the Annotated Corpora for Term Extraction Research (ACTER) Dataset.” In6th International Workshop on Computational Terminology (COMPUTERM 2020), 85–94. European Language Resources Association (ELRA).
    [Google Scholar]
  39. Rigouts Terryn, Ayla, Véronique Hoste, and Els Lefever
    2020a “In No Uncertain Terms: A Dataset for Monolingual and Multilingual Automatic Term Extraction from Comparable Corpora.” Language Resources and Evaluation54 (2): 385–418. 10.1007/s10579‑019‑09453‑9
    https://doi.org/10.1007/s10579-019-09453-9 [Google Scholar]
  40. 2020b “HAMLET: Hybrid Adaptable Machine Learning Approach to Extract Terminology.” Terminology 2021.
    [Google Scholar]
  41. 2022a “D-terminer: Online Demo for Monolingual and Bilingual Automatic Term Extraction.” InProceedings of the TERM21 Workshop, 33–40. Language Resources and Evaluation Conference (LREC 2022).
    [Google Scholar]
  42. 2022b “Tagging Terms in Text: A Supervised Sequential Labelling Approach to Automatic Term Extraction.” Terminology: International Journal of Theoretical and Applied Issues in Specialized Communication28 (1): 157–89. 10.1075/term.21010.rig
    https://doi.org/10.1075/term.21010.rig [Google Scholar]
  43. Tran, Hanh Thi Hong, Matej Martinc, Antoine Doucet, and Senja Pollak
    2022a “Can Cross-Domain Term Extraction Benefit from Cross-Lingual Transfer?” InDiscovery Science: 25th International Conference, DS 2022, Montpellier, France, October 10–12, 2022, Proceedings, 363–78. Springer. 10.1007/978‑3‑031‑18840‑4_26
    https://doi.org/10.1007/978-3-031-18840-4_26 [Google Scholar]
  44. Tran, Hanh Thi Hong, Matej Martinc, Andraz Pelicon, Antoine Doucet, and Senja Pollak
    2022b “Ensembling Transformers for Cross-Domain Automatic Term Extraction.” InFrom Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries: 24th International Conference on Asian Digital Libraries, ICADL 2022, Hanoi, Vietnam, November 30–December 2, 2022, Proceedings, 90–100. Springer. 10.1007/978‑3‑031‑21756‑2_7
    https://doi.org/10.1007/978-3-031-21756-2_7 [Google Scholar]
  45. Tran, Hanh Thi Hong, Matej Martinc, Jaya Caporusso, Antoine Doucet, and Senja Pollak
    2023 “The Recent Advances in Automatic Term Extraction: A Survey.” arXiv preprintarXiv:2301.06767.
    [Google Scholar]
  46. Tran, Hanh Thi Hong, Carlos-Emiliano Gonzalez-Gallardo, Julien Delaunay, Antoine Doucet, and Senja Pollak
    2024a “Is Prompting What Term Extraction Needs?” InText, Speech, and Dialogue, edited byElmar Nöth, Aleš Horák, and Petr Sojka, 17–29. Cham: Springer Nature Switzerland. ISBN978-3-031-70563-2. 10.1007/978‑3‑031‑70563‑2_2
    https://doi.org/10.1007/978-3-031-70563-2_2 [Google Scholar]
  47. Tran, Hanh Thi Hong, Matej Martinc, Andraz Repar, Nikola Ljubešić, Antoine Doucet, and Senja Pollak
    2024b “Can Cross-Domain Term Extraction Benefit from Cross-Lingual Transfer and Nested Term Labeling?” Machine Learning, 1–30. 10.1007/s10994‑023‑06506‑7
    https://doi.org/10.1007/s10994-023-06506-7 [Google Scholar]
  48. Tran, Hanh Thi Hong, Matej Martinc, Antoine Doucet, and Senja Pollak
    2022c “A Transformer-Based Sequence-Labeling Approach to the Slovenian Cross-Domain Automatic Term Extraction.” InSlovenian Conference on Language Technologies and Digital Humanities.
    [Google Scholar]
  49. Utka, Andrius
    2020 “Automatic Extraction of Lithuanian Cybersecurity Terms Using Deep Learning Approaches.” InHuman Language Technologies–The Baltic Perspective: Proceedings of the Ninth International Conference Baltic HLT 2020, vol.328, 39. IOS Press. 10.3233/FAIA328
    https://doi.org/10.3233/FAIA328 [Google Scholar]
  50. Vintar, Špela
    2010 “Bilingual Term Recognition Revisited: The Bag-of-Equivalents Term Alignment Approach and Its Evaluation.” Terminology: International Journal of Theoretical and Applied Issues in Specialized Communication16 (2): 141–58. 10.1075/term.16.2.01vin
    https://doi.org/10.1075/term.16.2.01vin [Google Scholar]
  51. Wang, Jiangyu, Chong Feng, Fang Liu, Xinyan Li, and Xiaomei Wang
    2023a “Extract Then Adjust: A Two-Stage Approach for Automatic Term Extraction.” InCCF International Conference on Natural Language Processing and Chinese Computing, 236–47. Springer. 10.1007/978‑3‑031‑44696‑2_19
    https://doi.org/10.1007/978-3-031-44696-2_19 [Google Scholar]
  52. Wang, Rui, Wei Liu, and Chris McDonald
    2016 “Featureless Domain-Specific Term Extraction with Minimal Labelled Data.” InProceedings of the Australasian Language Technology Association Workshop 2016, 103–12.
    [Google Scholar]
  53. Wang, Xiao, Weikang Zhou, Can Zu, Han Xia, Tianze Chen, Yuansen Zhang, Rui Zheng, Junjie Ye, Qi Zhang, Tao Gui, et al
    2023 “InstructUIE: Multi-Task Instruction Tuning for Unified Information Extraction.” arXiv preprintarXiv:2304.08085.
    [Google Scholar]
  54. Wolf, Petra, Ulrike Bernardi, Christian Federmann, and Sabine Hunsicker
    2011 “From Statistical Term Extraction to Hybrid Machine Translation.” InProceedings of the 15th Annual Conference of the European Association for Machine Translation.
    [Google Scholar]
  55. Yang, Lingpeng, Ji Donghong, Guodong Zhou, and Yu Nie
    2005 “Improving Retrieval Effectiveness by Using Key Terms in Top Retrieved Documents.” InEuropean Conference on Information Retrieval, 169–84. Springer.
    [Google Scholar]
  56. Yuan, Yu, Jie Gao, and Yue Zhang
    2017 “Supervised Learning for Robust Term Extraction.” In2017 International Conference on Asian Language Processing (IALP), 302–5. IEEE. 10.1109/IALP.2017.8300603
    https://doi.org/10.1109/IALP.2017.8300603 [Google Scholar]
  57. Zhang, Ziqi, Jie Gao, and Fabio Ciravegna
    2018 “SemRE-Rank: Improving Automatic Term Extraction by Incorporating Semantic Relatedness with Personalised PageRank.” ACM Transactions on Knowledge Discovery from Data (TKDD)12 (5): 1–41. 10.1145/3201408
    https://doi.org/10.1145/3201408 [Google Scholar]
/content/journals/10.1075/term.00082.tra
Loading
/content/journals/10.1075/term.00082.tra
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error