Volume 24, Issue 1
  • ISSN 0929-9971
  • E-ISSN: 1569-9994


The identification of reliable terms from domain-specific corpora using computational methods is a task that has to be validated manually by specialists, which is a highly time-consuming activity. To reduce this effort and improve term candidate selection, we implemented the Token Slot Recognition method, a filtering method based on terminological tokens which is used to rank extracted term candidates from domain-specific corpora. This paper presents the implementation of the term candidates filtering method we developed in linguistic and statistical approaches applied for automatic term extraction using several domain-specific corpora in different languages. We observed that the filtering method outperforms term candidate selection by ranking a higher number of terms at the top of the term candidate list than raw frequency, and for statistical term extraction the improvement is between 15% and 25% both in precision and recall. Our analyses further revealed a reduction in the number of term candidates to be validated manually by specialists. In conclusion, the number of term candidates extracted automatically from domain-specific corpora has been reduced significantly using the Token Slot Recognition filtering method, so term candidates can be easily and quickly validated by specialists.


Article metrics loading...

Loading full text...

Full text loading...



  1. Ananiadou, Sofia
    1988Towards a Methodology for Automatic Term Recognition. Dissertation. University of Manchester, Institute of Science and Technology.
    [Google Scholar]
  2. Ananiadou, Sophia
    1994a “A Computational Linguistic Approach to Automatic Term Recognition.” InProceedings of the 3rd International Society for Knowledge Organization (ISKO 1994) 4: 134–141. Copenhagen, Denmark: Indeks Verlag.
    [Google Scholar]
  3. 1994b “A Methodology for Automatic Term Recognition.” InProceedings of the 15th International Conference on Computational Linguistics (COLING 1994)2: 1034–1038. Kyoto, Japan. doi: 10.3115/991250.991317
    https://doi.org/10.3115/991250.991317 [Google Scholar]
  4. Arppe, Antti
    1995 “Term Extraction from Unrestricted Text.” InProceedings of the 10th Nordic Conference on Computational Linguistics (NODALIDA 1995). Helsinki, Finland: Department of General Linguistics.
    [Google Scholar]
  5. Aubin, Sophie , and Thierry Hamon
    2006 “Improving Term Extraction with Terminological Resources.” InAdvances in Natural Language Processing. Lecture Notes in Computer Science4139. Berlin, Heidelberg: Springer. doi: 10.1007/11816508_39
    https://doi.org/10.1007/11816508_39 [Google Scholar]
  6. Badia, Toni , Mercè Pujol , Antoni Tuells , Jorge Vivaldi , Lluis de Yzaguirre , and Teresa Cabré
    1998 “IULA’s LSP Multilingual Corpus: Compilation and Processing.” InProceedings of the 1st International Conference on Language Resources and Evaluation. Granada, Spain.
    [Google Scholar]
  7. Basili, Roberto , Gianluca De Rossi , and Maria Teresa Pazienza
    1997 “Inducing Terminology for Lexical Acquisition.” InProceedings of the 2nd Conference on Empirical Methods in Natural Language Processing Conference (EMNLP 1997). Providence, USA. (www.aclweb.org/anthology/W97-0314). Accessed15 February 2018
    [Google Scholar]
  8. Bentounsi, Imene , and Zizette Boufaida
    2013 “Extracting Candidate Terms from Medical Texts.” InInternational Conference on Computer Systems and Applications (AICCSA): 1–4. Fes, Morocco. doi: 10.1109/AICCSA.2013.6616486
    https://doi.org/10.1109/AICCSA.2013.6616486 [Google Scholar]
  9. Bourigault, Didier
    1992 “Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases.” InProceedings of the 14th Conference on Computational linguistics (COLING 1992) 3: 977–981. Nantes, France. doi: 10.3115/992383.992415
    https://doi.org/10.3115/992383.992415 [Google Scholar]
  10. Bourigault, Didier , Isabelle Gonzalez-Mullier , and Cécile Gros
    1996 “LEXTER, a Natural Language Processing Tool for Terminology Extraction.” InProceedings of the 7th European Association for Lexicography International Congress on Lexicography International Congress (EURALEX 1996): 771–779. Göteborg, Sweden: Göteborg University.
    [Google Scholar]
  11. Bourigault, Didier , Christian Jacquemin , and Marie-Claude L’Homme
    2001 “Introduction.” Recent Advances in Computational Terminology2, ed. by Didier Bourigault , Christian Jacquemin , and Marie-Claude L’Homme , iix–xviii. John Benjamins. doi: 10.1075/nlp.2.01bou
    https://doi.org/10.1075/nlp.2.01bou [Google Scholar]
  12. Bouslimi, Riadh , Jalel Akaichi , Mouhamed Gaith Ayadi and Hana Hedhli
    2016 “A Medical Collaboration Network for Medical Image Analysis.” Network Modeling Analysis in Health Informatics and Bioinformatics5(1): 1–11.
    [Google Scholar]
  13. Carreras, Xavier , Isaac Chao , Lluís Padró and Muntsa Padró
    2004 “FreeLing: An Open-Source Suite of Language Analyzers.” InProceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004). Lisbon, Portugal.
    [Google Scholar]
  14. Conrado, Merley S. , Thiago A. S. Pardo , and Solange O. Rezende
    2013 “Exploration of a Rich Feature Set for Automatic Term Extraction.” Advances in Artificial Intelligence and Its Applications8265: 342–354. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer. doi: 10.1007/978‑3‑642‑45114‑0_28
    https://doi.org/10.1007/978-3-642-45114-0_28 [Google Scholar]
  15. Dagan, Ido , and Ken Church
    1994 “Termight: Identifying and Translating Technical Terminology.” Proceedings of the 4th Conference on Applied Natural Language Processing: 34–40. Stuttgart, Germany.
    [Google Scholar]
  16. David, Sophie , and Pierre Plante
    1990 “Le progiciel TERMINO : de la nécessité d’une analyse morphosyntaxique pour le dépouillement terminologique des textes.” InActes du Colloque international sur les industries de la langue : perspectives des années 19901: 71–88. Montreal, Canada.
    [Google Scholar]
  17. Drouin, Patrick
    1997 “Une méthodologie d’identification automatique des syntagmes terminologiques: l’apport de la description du non-terme.” Meta: Journal des traducteurs42(1): 45–54. doi: 10.7202/002593ar
    https://doi.org/10.7202/002593ar [Google Scholar]
  18. Daille, Béatrice
    1994Approche mixte pour l’extraction de terminologie: statistique lexicale et filtres linguistiques. Dissertation. Université de Paris 7.
    [Google Scholar]
  19. 1995Combined Approach for Terminology Extraction: Lexical Statistics and Linguistic Filtering. 5. Lancaster, United Kingdom: UCREL Technical Papers.
    [Google Scholar]
  20. 1997 “Study and Implementation of Combined Techniques for Automatic Extraction of Terminology.” The Balancing Act: Combining Symbolic and Statistical Approaches to Language1: 49–66. Boston: Massachusetts Institute of Technology.
    [Google Scholar]
  21. Dias, Gaël
    2003 “Multiword Unit Hybrid Extraction.” InProceedings of the ACL Workshop on Multiword Expressions: Analysis, Acquisition and Treatment (MWE 2003) 18: 41–48. Sapporo, Japan.
    [Google Scholar]
  22. Dramé, Khadim , Gallo Diallo , Fleur Delva , Jean François Dartigues , Evelyne Mouillet , Roger Salamon and Fleur Mougin
    2014 “Reuse of Termino-ontological Resources and Text Corpora for Building a Multilingual Domain Ontology: an Application to Alzheimer’s Disease.” Journal of biomedical informatics48: 171–182. doi: 10.1016/j.jbi.2013.12.013
    https://doi.org/10.1016/j.jbi.2013.12.013 [Google Scholar]
  23. Earl, Lois L.
    1970 “Experiments in Automatic Extracting and Indexing.” Information Storage and Retrieval6(4): 313–330. doi: 10.1016/0020‑0271(70)90025‑2
    https://doi.org/10.1016/0020-0271(70)90025-2 [Google Scholar]
  24. Enguehard, Chantal , and Laurent Pantera
    1995 “Automatic Natural Acquisition of a Terminology.” Journal of Quantitative Linguistics2(1): 27–32. doi: 10.1080/09296179508590032
    https://doi.org/10.1080/09296179508590032 [Google Scholar]
  25. Evans, David A. , and Chengxiang Zhai
    1996 “Noun-phrase Analysis in Unrestricted Text for Information Retrieval.” InProceedings of the 34th Annual Meeting on Association for Computational Linguistics (ACL 1996): 17–24. Santa Cruz, California, USA. doi: 10.3115/981863.981866
    https://doi.org/10.3115/981863.981866 [Google Scholar]
  26. Evert, Stefan , and Brigitte Krenn
    2001 “Methods for the Qualitative Evaluation of Lexical Association Measures.” InProceedings of the 39th Annual Meeting on Association for Computational Linguistics:188–195.
    [Google Scholar]
  27. Evert, Stefan
    2005The Statistics of Word Cooccurrences: Word Pairs and Collocations. Dissertation. University of Stuttgart.
    [Google Scholar]
  28. Fabre, Cécile
    1996Interprétation automatique des séquences binominales en anglais et en français. Application à la recherche d’informations. Dissertation. Université de Rennes 1.
    [Google Scholar]
  29. Fedorenko, Denis G. , Nikita Astrakhantsev , and Denis Turdakov
    2013 “Automatic Recognition of Domain-specific Terms: an Experimental Evaluation.” InProceedings of the Institute for System Programming of the RAS (ISP RAS) 26(4): 15–23. Russia.
    [Google Scholar]
  30. Foo, Jody
    2012Computational Terminology: Exploring Bilingual and Monolingual Term Extraction. Dissertation. Linköping University.
    [Google Scholar]
  31. Frantzi, Katerina T. , and Sophia Ananiadou
    1997 “Automatic Term Recognition using Contextual Cues.” InProceedings of the 3rd DELOS Workshop: 19–27. Zurich, Suisse.
    [Google Scholar]
  32. Gornostay, Tatiana
    2010 “Terminology Management in Real Use.” InProceedings of the 5th International Conference on Applied Linguistics in Science and Education: 25–26. Saint Petersburg, Russia.
    [Google Scholar]
  33. Heid, Ulrich , and John McNaught
    1991EUROTRA-7 Study: Feasibility and Project Definition Study on the Reusability of Lexical and Terminological Resources in Computerised Applications. Final Report. CEC-DG XIII. University of Stuttgart.
    [Google Scholar]
  34. Jacquemin, Christian
    1994 “FASTR: A Unification-based Front-end to Automatic Indexing.” InProceedings of the 4th International Conference on Computer-Assisted Information Retrieval (Recherche d’information et ses Applications) (RIAO 1994)2: 34–47. New York, USA: Rockfeller University Press.
    [Google Scholar]
  35. 1999 “Syntagmatic and Paradigmatic Representations of Term Variation.” InProceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999): 341–348. College Park, Maryland, USA.
    [Google Scholar]
  36. Jiang, Birong , Endong Xun , and Jianzhong Qi
    2015 “A Domain Independent Approach for Extracting Terms from Research Papers”. InDatabases Theory and Applications. ADC 2015, ed. by Mohamed Sharaf , Muhammad Cheema , and Jianzhong Qi , 155–166. Australia. Lecture Notes in Computer Science, vol9093. Heidelberg, Berlin: Springer.
    [Google Scholar]
  37. Justeson, John S. , and Slava M. Katz
    1995 “Technical Terminology: some Linguistic Properties and an Algorithm for Identification in Text.” Natural Language Engineering1(1): 9–27. doi: 10.1017/S1351324900000048
    https://doi.org/10.1017/S1351324900000048 [Google Scholar]
  38. Kageura, Kyo , and Bin Umino
    1996 “Methods of Automatic Term Recognition: A Review.” Terminology3(2): 259–289. doi: 10.1075/term.3.2.03kag
    https://doi.org/10.1075/term.3.2.03kag [Google Scholar]
  39. Loukachevitch, Natalia V.
    2012 “Automatic Term Recognition Needs Multiple Evidence.” InProceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012): 2401–2407. Istanbul, Turkey.
    [Google Scholar]
  40. Liu, Bao , Guiping Zhang , and Dongfeng Cai
    2008 “Technical Term Automatic Extraction Research based on Statistics and Rules [J].” Computer Engineering and Applications44(23): 147–150.
    [Google Scholar]
  41. Lossio-Ventura, Juan Antonio , et al.
    2014 “Yet Another Ranking Function for Automatic Multiword Term Extraction.” InAdvances in Natural Language Processing. NLP 2014, ed. by Adam Przepiórkowski , and Maciej Ogrodniczuk , 52–64. Poland. Lecture Notes in Computer Science, vol8686. Heidelberg, Berlin: Springer.
    [Google Scholar]
  42. 2016 “Biomedical Term Extraction: Overview and a New Methodology.” Information Retrieval Journal19(1–2): 59–99.10.1007/s10791‑015‑9262‑2
    https://doi.org/10.1007/s10791-015-9262-2 [Google Scholar]
  43. Maynard, Diana , and Sophia Ananiadou
    1999 “Identifying Contextual Information for Multi-word Term Extraction.” InProceedings of Terminology and Knowledge Engineering Conference99: 212–221. Innsbruck, Austria.
    [Google Scholar]
  44. Messaoudi, Abir , Riadh Bouslimi , and Jalel Akaichi
    2013 “Indexing Medical Images based on Collaborative Experts Reports.” International Journal of Computer Applications70(5): 1–9. doi: 10.5120/11955‑7787
    https://doi.org/10.5120/11955-7787 [Google Scholar]
  45. McEnery, Tony , et al.
    1997 “The Exploitation of Multilingual Annotated Corpora for Term Extraction.” Corpus Annotation: Linguistic Information from Computer Text Corpora: 220–230. Boston, MA, USA: Addison Wesley Longman.
    [Google Scholar]
  46. Merkel, Magnus , and Mikael Andersson
    2000 “Knowledge-lite Extraction of Multi-word Units with Language Filters and Entropy Thresholds.” InProceedings of the 6th International Conference on Computer-Assisted Information Retrieval (Recherche d’Information et ses Applications) (RIAO 2000): 737–746. Paris, France.
    [Google Scholar]
  47. Miller, George A.
    1995 “WordNet: a Lexical Database for English.” Communications of the ACM38(11): 39–41. doi: 10.1145/219717.219748
    https://doi.org/10.1145/219717.219748 [Google Scholar]
  48. Naulleau, Elie
    1998Apprentissage et filtrage syntactico-sémantique de syntagmes nominaux pertinents pour la recherche documentaire. Dissertation. Université Paris XIII.
    [Google Scholar]
  49. Nazarenko, Adeline , and Haifa Zargayouna
    2009 “Evaluating Term Extraction.” InInternational Conference on Recent Advances in Natural Language Processing (RANLP 2009): 299–304. Borovets, Bulgaria.
    [Google Scholar]
  50. Oliver, Antoni , Salvador Climent , and Joaquim Moré
    2007Traducción y tecnologías4. Barcelona: Editorial UOC.
    [Google Scholar]
  51. Oliver, Antoni , and Mercè Vàzquez
    2015 “TBXTools: A Free, Fast and Flexible Tool for Automatic Terminology Extraction.” International Conference on Recent Advances in Natural Language Processing (RANLP 2015): 473–479. Hissar, Bulgaria.
    [Google Scholar]
  52. Padró, Lluís , and Evgeny Stanilovsky
    2012 “FreeLing 3.0: Towards Wider Multilinguality.” InProceedings of the 8th International Conference on Language Resources and Evaluation Conference (LREC 2012): 2473–2479. Istanbul, Turkey.
    [Google Scholar]
  53. Pazienza, Maria Teresa , Pennacchiotti, Marco , and Zanzotto, Fabio
    2005 “Terminology Extraction: an Analysis of Linguistic and Statistical Approaches.” Knowledge Mining. Studies in Fuzziness and Soft Computing185: 255–279. Heidelberg, Berlin: Springer.
    [Google Scholar]
  54. Pereira, Rui , Paul Crocker , and Gaël Dias
    2004 “A Parallel Multikey Quicksort Algorithm for Mining Multiword Units.” InProceedings of the Workshop on Methodologies and Evaluation of Multiword Units in Real-world Application: 17–23. Lisbon, Portugal.
    [Google Scholar]
  55. Piao, Scott S. , and McEnery, Tony
    2001 “Multi-word unit Alignment in English-Chinese Parallel Corpora.” InProceedings of the Corpus Linguistics Conference13: 466–475. Lancaster. England.
    [Google Scholar]
  56. Smadja, Frank
    1993 “Retrieving Collocations from Text: Xtract”. Computational Linguistics19(1): 143–177.
    [Google Scholar]
  57. Valaski, Joselaine , Sheila Reinehr , and Andreia Malucelli
    2015 “Approaches and Strategies to Extract Relevant Terms: How are they being applied?” InProceedings of the International Conference on Artificial Intelligence (ICAI 2015): 478–484. The Steering Committee of the World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp). San Diego, USA.
    [Google Scholar]
  58. Vasiljevs, Andrejs , Marcis Pinnis , and Tatiana Gornostay
    2014 “Service Model for Semi-automatic Generation of Multilingual Terminology Resources.” InProceedings of the Terminology and Knowledge Engineering Conference: 67–76. Berlin, Germany.
    [Google Scholar]
  59. Vàzquez, Mercè , and Antoni Oliver
    2013 “Improving Term Candidate Validation Using Ranking Metrics.” InProceedings of the 3rd World Conference on Information Technology (WCIT-2012) 3: 1348–1359. AWERProcedia Information Technology & Computer Science. Barcelona, Spain.
    [Google Scholar]
  60. Vàzquez, Mercè
    2014Estratègies estadístiques aplicades a l’extracció automàtica de terminologia. Dissertation. Universitat Pompeu Fabra.
    [Google Scholar]
  61. Velardi, Paola , Michele Missikoff , and Roberto Basili
    2001 “Identification of Relevant Terms to Support the Construction of Domain Ontologies.” InProceedings of the Workshop on Human Language Technology and Knowledge Management – Volume 2001, 1–8. Association for Computational Linguistics. Morristown, USA.
    [Google Scholar]
  62. Vivaldi, Jorge , and Horacio Rodríguez
    2001 “Improving Term Extraction by Combining different Techniques.” Terminology7(1): 31–48. doi: 10.1075/term.7.1.04viv
    https://doi.org/10.1075/term.7.1.04viv [Google Scholar]
  63. Vivaldi, Jorge
    2009 “Corpus and Exploitation Tool: IULACT and BwanaNet.” InInternational Conference on Corpus Linguistics (CICL 2009), A survey on corpus-based research:224–239. Universidad de Murcia, Spain.
    [Google Scholar]
  64. Vossen, Piek
    1998A Multilingual Database with Lexical Semantic Networks. Dordrecht: Kluwer Academic Publishers. doi: 10.1007/978‑94‑017‑1491‑4
    https://doi.org/10.1007/978-94-017-1491-4 [Google Scholar]
  65. Vu, Thuy , Ai Ti Aw , and Min Zhang
    2008 “Term Extraction through Unithood and Termhood Unification.” InProceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP 2008) 1: 631–636. Hyderabad, India.
    [Google Scholar]
  66. Wong, Wilson , Wei Liu , and Mohammed Bennamoun
    2007 “Tree-traversing Ant Algorithm for Term Clustering based on Featureless Similarities.” Data Mining and Knowledge Discovery15(3): 349–381. doi: 10.1007/s10618‑007‑0073‑y
    https://doi.org/10.1007/s10618-007-0073-y [Google Scholar]
  67. Zheng, Dequan , Tiejun Zhao , and Jing Yang
    2009 “Research on Domain Term Extraction based on Conditional Random Fields.” InInternational Conference on Computer Processing of Oriental Languages:290–296. Heidelberg, Berlin: Springer.
    [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error