1887
Volume 24, Issue 1
  • ISSN 0929-9971
  • E-ISSN: 1569-9994

Abstract

The identification of reliable terms from domain-specific corpora using computational methods is a task that has to be validated manually by specialists, which is a highly time-consuming activity. To reduce this effort and improve term candidate selection, we implemented the Token Slot Recognition method, a filtering method based on terminological tokens which is used to rank extracted term candidates from domain-specific corpora. This paper presents the implementation of the term candidates filtering method we developed in linguistic and statistical approaches applied for automatic term extraction using several domain-specific corpora in different languages. We observed that the filtering method outperforms term candidate selection by ranking a higher number of terms at the top of the term candidate list than raw frequency, and for statistical term extraction the improvement is between 15% and 25% both in precision and recall. Our analyses further revealed a reduction in the number of term candidates to be validated manually by specialists. In conclusion, the number of term candidates extracted automatically from domain-specific corpora has been reduced significantly using the Token Slot Recognition filtering method, so term candidates can be easily and quickly validated by specialists.

Loading

Article metrics loading...

/content/journals/10.1075/term.00016.vaz
2018-05-31
2019-12-08
Loading full text...

Full text loading...

/deliver/fulltext/term.00016.vaz.html?itemId=/content/journals/10.1075/term.00016.vaz&mimeType=html&fmt=ahah

References

  1. Ananiadou, Sofia
    1988Towards a Methodology for Automatic Term Recognition. Dissertation. University of Manchester, Institute of Science and Technology.
    [Google Scholar]
  2. Ananiadou, Sophia
    1994a “A Computational Linguistic Approach to Automatic Term Recognition.” InProceedings of the 3rd International Society for Knowledge Organization (ISKO 1994) 4: 134–141. Copenhagen, Denmark: Indeks Verlag.
    [Google Scholar]
  3. 1994b “A Methodology for Automatic Term Recognition.” InProceedings of the 15th International Conference on Computational Linguistics (COLING 1994)2: 1034–1038. Kyoto, Japan. doi: 10.3115/991250.991317
    https://doi.org/10.3115/991250.991317 [Google Scholar]
  4. Arppe, Antti
    1995 “Term Extraction from Unrestricted Text.” InProceedings of the 10th Nordic Conference on Computational Linguistics (NODALIDA 1995). Helsinki, Finland: Department of General Linguistics.
    [Google Scholar]
  5. Aubin, Sophie , and Thierry Hamon
    2006 “Improving Term Extraction with Terminological Resources.” InAdvances in Natural Language Processing. Lecture Notes in Computer Science4139. Berlin, Heidelberg: Springer. doi: 10.1007/11816508_39
    https://doi.org/10.1007/11816508_39 [Google Scholar]
  6. Badia, Toni , Mercè Pujol , Antoni Tuells , Jorge Vivaldi , Lluis de Yzaguirre , and Teresa Cabré
    1998 “IULA’s LSP Multilingual Corpus: Compilation and Processing.” InProceedings of the 1st International Conference on Language Resources and Evaluation. Granada, Spain.
    [Google Scholar]
  7. Basili, Roberto , Gianluca De Rossi , and Maria Teresa Pazienza
    1997 “Inducing Terminology for Lexical Acquisition.” InProceedings of the 2nd Conference on Empirical Methods in Natural Language Processing Conference (EMNLP 1997). Providence, USA. (www.aclweb.org/anthology/W97-0314). Accessed15 February 2018
    [Google Scholar]
  8. Bentounsi, Imene , and Zizette Boufaida
    2013 “Extracting Candidate Terms from Medical Texts.” InInternational Conference on Computer Systems and Applications (AICCSA): 1–4. Fes, Morocco. doi: 10.1109/AICCSA.2013.6616486
    https://doi.org/10.1109/AICCSA.2013.6616486 [Google Scholar]
  9. Bourigault, Didier
    1992 “Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases.” InProceedings of the 14th Conference on Computational linguistics (COLING 1992) 3: 977–981. Nantes, France. doi: 10.3115/992383.992415
    https://doi.org/10.3115/992383.992415 [Google Scholar]
  10. Bourigault, Didier , Isabelle Gonzalez-Mullier , and Cécile Gros
    1996 “LEXTER, a Natural Language Processing Tool for Terminology Extraction.” InProceedings of the 7th European Association for Lexicography International Congress on Lexicography International Congress (EURALEX 1996): 771–779. Göteborg, Sweden: Göteborg University.
    [Google Scholar]
  11. Bourigault, Didier , Christian Jacquemin , and Marie-Claude L’Homme
    2001 “Introduction.” Recent Advances in Computational Terminology2, ed. by Didier Bourigault , Christian Jacquemin , and Marie-Claude L’Homme , iix–xviii. John Benjamins. doi: 10.1075/nlp.2.01bou
    https://doi.org/10.1075/nlp.2.01bou [Google Scholar]
  12. Bouslimi, Riadh , Jalel Akaichi , Mouhamed Gaith Ayadi and Hana Hedhli
    2016 “A Medical Collaboration Network for Medical Image Analysis.” Network Modeling Analysis in Health Informatics and Bioinformatics5(1): 1–11.
    [Google Scholar]
  13. Carreras, Xavier , Isaac Chao , Lluís Padró and Muntsa Padró
    2004 “FreeLing: An Open-Source Suite of Language Analyzers.” InProceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004). Lisbon, Portugal.
    [Google Scholar]
  14. Conrado, Merley S. , Thiago A. S. Pardo , and Solange O. Rezende
    2013 “Exploration of a Rich Feature Set for Automatic Term Extraction.” Advances in Artificial Intelligence and Its Applications8265: 342–354. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer. doi: 10.1007/978‑3‑642‑45114‑0_28
    https://doi.org/10.1007/978-3-642-45114-0_28 [Google Scholar]
  15. Dagan, Ido , and Ken Church
    1994 “Termight: Identifying and Translating Technical Terminology.” Proceedings of the 4th Conference on Applied Natural Language Processing: 34–40. Stuttgart, Germany.
    [Google Scholar]
  16. David, Sophie , and Pierre Plante
    1990 “Le progiciel TERMINO : de la nécessité d’une analyse morphosyntaxique pour le dépouillement terminologique des textes.” InActes du Colloque international sur les industries de la langue : perspectives des années 19901: 71–88. Montreal, Canada.
    [Google Scholar]
  17. Drouin, Patrick
    1997 “Une méthodologie d’identification automatique des syntagmes terminologiques: l’apport de la description du non-terme.” Meta: Journal des traducteurs42(1): 45–54. doi: 10.7202/002593ar
    https://doi.org/10.7202/002593ar [Google Scholar]
  18. Daille, Béatrice
    1994Approche mixte pour l’extraction de terminologie: statistique lexicale et filtres linguistiques. Dissertation. Université de Paris 7.
    [Google Scholar]
  19. 1995Combined Approach for Terminology Extraction: Lexical Statistics and Linguistic Filtering. 5. Lancaster, United Kingdom: UCREL Technical Papers.
    [Google Scholar]
  20. 1997 “Study and Implementation of Combined Techniques for Automatic Extraction of Terminology.” The Balancing Act: Combining Symbolic and Statistical Approaches to Language1: 49–66. Boston: Massachusetts Institute of Technology.
    [Google Scholar]
  21. Dias, Gaël
    2003 “Multiword Unit Hybrid Extraction.” InProceedings of the ACL Workshop on Multiword Expressions: Analysis, Acquisition and Treatment (MWE 2003) 18: 41–48. Sapporo, Japan.
    [Google Scholar]
  22. Dramé, Khadim , Gallo Diallo , Fleur Delva , Jean François Dartigues , Evelyne Mouillet , Roger Salamon and Fleur Mougin
    2014 “Reuse of Termino-ontological Resources and Text Corpora for Building a Multilingual Domain Ontology: an Application to Alzheimer’s Disease.” Journal of biomedical informatics48: 171–182. doi: 10.1016/j.jbi.2013.12.013
    https://doi.org/10.1016/j.jbi.2013.12.013 [Google Scholar]
  23. Earl, Lois L.
    1970 “Experiments in Automatic Extracting and Indexing.” Information Storage and Retrieval6(4): 313–330. doi: 10.1016/0020‑0271(70)90025‑2
    https://doi.org/10.1016/0020-0271(70)90025-2 [Google Scholar]
  24. Enguehard, Chantal , and Laurent Pantera
    1995 “Automatic Natural Acquisition of a Terminology.” Journal of Quantitative Linguistics2(1): 27–32. doi: 10.1080/09296179508590032
    https://doi.org/10.1080/09296179508590032 [Google Scholar]
  25. Evans, David A. , and Chengxiang Zhai
    1996 “Noun-phrase Analysis in Unrestricted Text for Information Retrieval.” InProceedings of the 34th Annual Meeting on Association for Computational Linguistics (ACL 1996): 17–24. Santa Cruz, California, USA. doi: 10.3115/981863.981866
    https://doi.org/10.3115/981863.981866 [Google Scholar]
  26. Evert, Stefan , and Brigitte Krenn
    2001 “Methods for the Qualitative Evaluation of Lexical Association Measures.” InProceedings of the 39th Annual Meeting on Association for Computational Linguistics:188–195.
    [Google Scholar]
  27. Evert, Stefan
    2005The Statistics of Word Cooccurrences: Word Pairs and Collocations. Dissertation. University of Stuttgart.
    [Google Scholar]
  28. Fabre, Cécile
    1996Interprétation automatique des séquences binominales en anglais et en français. Application à la recherche d’informations. Dissertation. Université de Rennes 1.
    [Google Scholar]
  29. Fedorenko, Denis G. , Nikita Astrakhantsev , and Denis Turdakov
    2013 “Automatic Recognition of Domain-specific Terms: an Experimental Evaluation.” InProceedings of the Institute for System Programming of the RAS (ISP RAS) 26(4): 15–23. Russia.
    [Google Scholar]
  30. Foo, Jody
    2012Computational Terminology: Exploring Bilingual and Monolingual Term Extraction. Dissertation. Linköping University.
    [Google Scholar]
  31. Frantzi, Katerina T. , and Sophia Ananiadou
    1997 “Automatic Term Recognition using Contextual Cues.” InProceedings of the 3rd DELOS Workshop: 19–27. Zurich, Suisse.
    [Google Scholar]
  32. Gornostay, Tatiana
    2010 “Terminology Management in Real Use.” InProceedings of the 5th International Conference on Applied Linguistics in Science and Education: 25–26. Saint Petersburg, Russia.
    [Google Scholar]
  33. Heid, Ulrich , and John McNaught
    1991EUROTRA-7 Study: Feasibility and Project Definition Study on the Reusability of Lexical and Terminological Resources in Computerised Applications. Final Report. CEC-DG XIII. University of Stuttgart.
    [Google Scholar]
  34. Jacquemin, Christian
    1994 “FASTR: A Unification-based Front-end to Automatic Indexing.” InProceedings of the 4th International Conference on Computer-Assisted Information Retrieval (Recherche d’information et ses Applications) (RIAO 1994)2: 34–47. New York, USA: Rockfeller University Press.
    [Google Scholar]
  35. 1999 “Syntagmatic and Paradigmatic Representations of Term Variation.” InProceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999): 341–348. College Park, Maryland, USA.
    [Google Scholar]
  36. Jiang, Birong , Endong Xun , and Jianzhong Qi
    2015 “A Domain Independent Approach for Extracting Terms from Research Papers”. InDatabases Theory and Applications. ADC 2015, ed. by Mohamed Sharaf , Muhammad Cheema , and Jianzhong Qi , 155–166. Australia. Lecture Notes in Computer Science, vol9093. Heidelberg, Berlin: Springer.
    [Google Scholar]
  37. Justeson, John S. , and Slava M. Katz
    1995 “Technical Terminology: some Linguistic Properties and an Algorithm for Identification in Text.” Natural Language Engineering1(1): 9–27. doi: 10.1017/S1351324900000048
    https://doi.org/10.1017/S1351324900000048 [Google Scholar]
  38. Kageura, Kyo , and Bin Umino
    1996 “Methods of Automatic Term Recognition: A Review.” Terminology3(2): 259–289. doi: 10.1075/term.3.2.03kag
    https://doi.org/10.1075/term.3.2.03kag [Google Scholar]
  39. Loukachevitch, Natalia V.
    2012 “Automatic Term Recognition Needs Multiple Evidence.” InProceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012): 2401–2407. Istanbul, Turkey.
    [Google Scholar]
  40. Liu, Bao , Guiping Zhang , and Dongfeng Cai
    2008 “Technical Term Automatic Extraction Research based on Statistics and Rules [J].” Computer Engineering and Applications44(23): 147–150.
    [Google Scholar]
  41. Lossio-Ventura, Juan Antonio , et al.
    2014 “Yet Another Ranking Function for Automatic Multiword Term Extraction.” InAdvances in Natural Language Processing. NLP 2014, ed. by Adam Przepiórkowski , and Maciej Ogrodniczuk , 52–64. Poland. Lecture Notes in Computer Science, vol8686. Heidelberg, Berlin: Springer.
    [Google Scholar]
  42. 2016 “Biomedical Term Extraction: Overview and a New Methodology.” Information Retrieval Journal19(1–2): 59–99.10.1007/s10791‑015‑9262‑2
    https://doi.org/10.1007/s10791-015-9262-2 [Google Scholar]
  43. Maynard, Diana , and Sophia Ananiadou
    1999 “Identifying Contextual Information for Multi-word Term Extraction.” InProceedings of Terminology and Knowledge Engineering Conference99: 212–221. Innsbruck, Austria.
    [Google Scholar]
  44. Messaoudi, Abir , Riadh Bouslimi , and Jalel Akaichi
    2013 “Indexing Medical Images based on Collaborative Experts Reports.” International Journal of Computer Applications70(5): 1–9. doi: 10.5120/11955‑7787
    https://doi.org/10.5120/11955-7787 [Google Scholar]
  45. McEnery, Tony , et al.
    1997 “The Exploitation of Multilingual Annotated Corpora for Term Extraction.” Corpus Annotation: Linguistic Information from Computer Text Corpora: 220–230. Boston, MA, USA: Addison Wesley Longman.
    [Google Scholar]
  46. Merkel, Magnus , and Mikael Andersson
    2000 “Knowledge-lite Extraction of Multi-word Units with Language Filters and Entropy Thresholds.” InProceedings of the 6th International Conference on Computer-Assisted Information Retrieval (Recherche d’Information et ses Applications) (RIAO 2000): 737–746. Paris, France.
    [Google Scholar]
  47. Miller, George A.
    1995 “WordNet: a Lexical Database for English.” Communications of the ACM38(11): 39–41. doi: 10.1145/219717.219748
    https://doi.org/10.1145/219717.219748 [Google Scholar]
  48. Naulleau, Elie
    1998Apprentissage et filtrage syntactico-sémantique de syntagmes nominaux pertinents pour la recherche documentaire. Dissertation. Université Paris XIII.
    [Google Scholar]
  49. Nazarenko, Adeline , and Haifa Zargayouna
    2009 “Evaluating Term Extraction.” InInternational Conference on Recent Advances in Natural Language Processing (RANLP 2009): 299–304. Borovets, Bulgaria.
    [Google Scholar]
  50. Oliver, Antoni , Salvador Climent , and Joaquim Moré
    2007Traducción y tecnologías4. Barcelona: Editorial UOC.
    [Google Scholar]
  51. Oliver, Antoni , and Mercè Vàzquez
    2015 “TBXTools: A Free, Fast and Flexible Tool for Automatic Terminology Extraction.” International Conference on Recent Advances in Natural Language Processing (RANLP 2015): 473–479. Hissar, Bulgaria.
    [Google Scholar]
  52. Padró, Lluís , and Evgeny Stanilovsky
    2012 “FreeLing 3.0: Towards Wider Multilinguality.” InProceedings of the 8th International Conference on Language Resources and Evaluation Conference (LREC 2012): 2473–2479. Istanbul, Turkey.
    [Google Scholar]
  53. Pazienza, Maria Teresa , Pennacchiotti, Marco , and Zanzotto, Fabio
    2005 “Terminology Extraction: an Analysis of Linguistic and Statistical Approaches.” Knowledge Mining. Studies in Fuzziness and Soft Computing185: 255–279. Heidelberg, Berlin: Springer.
    [Google Scholar]
  54. Pereira, Rui , Paul Crocker , and Gaël Dias
    2004 “A Parallel Multikey Quicksort Algorithm for Mining Multiword Units.” InProceedings of the Workshop on Methodologies and Evaluation of Multiword Units in Real-world Application: 17–23. Lisbon, Portugal.
    [Google Scholar]
  55. Piao, Scott S. , and McEnery, Tony
    2001 “Multi-word unit Alignment in English-Chinese Parallel Corpora.” InProceedings of the Corpus Linguistics Conference13: 466–475. Lancaster. England.
    [Google Scholar]
  56. Smadja, Frank
    1993 “Retrieving Collocations from Text: Xtract”. Computational Linguistics19(1): 143–177.
    [Google Scholar]
  57. Valaski, Joselaine , Sheila Reinehr , and Andreia Malucelli
    2015 “Approaches and Strategies to Extract Relevant Terms: How are they being applied?” InProceedings of the International Conference on Artificial Intelligence (ICAI 2015): 478–484. The Steering Committee of the World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp). San Diego, USA.
    [Google Scholar]
  58. Vasiljevs, Andrejs , Marcis Pinnis , and Tatiana Gornostay
    2014 “Service Model for Semi-automatic Generation of Multilingual Terminology Resources.” InProceedings of the Terminology and Knowledge Engineering Conference: 67–76. Berlin, Germany.
    [Google Scholar]
  59. Vàzquez, Mercè , and Antoni Oliver
    2013 “Improving Term Candidate Validation Using Ranking Metrics.” InProceedings of the 3rd World Conference on Information Technology (WCIT-2012) 3: 1348–1359. AWERProcedia Information Technology & Computer Science. Barcelona, Spain.
    [Google Scholar]
  60. Vàzquez, Mercè
    2014Estratègies estadístiques aplicades a l’extracció automàtica de terminologia. Dissertation. Universitat Pompeu Fabra.
    [Google Scholar]
  61. Velardi, Paola , Michele Missikoff , and Roberto Basili
    2001 “Identification of Relevant Terms to Support the Construction of Domain Ontologies.” InProceedings of the Workshop on Human Language Technology and Knowledge Management – Volume 2001, 1–8. Association for Computational Linguistics. Morristown, USA.
    [Google Scholar]
  62. Vivaldi, Jorge , and Horacio Rodríguez
    2001 “Improving Term Extraction by Combining different Techniques.” Terminology7(1): 31–48. doi: 10.1075/term.7.1.04viv
    https://doi.org/10.1075/term.7.1.04viv [Google Scholar]
  63. Vivaldi, Jorge
    2009 “Corpus and Exploitation Tool: IULACT and BwanaNet.” InInternational Conference on Corpus Linguistics (CICL 2009), A survey on corpus-based research:224–239. Universidad de Murcia, Spain.
    [Google Scholar]
  64. Vossen, Piek
    1998A Multilingual Database with Lexical Semantic Networks. Dordrecht: Kluwer Academic Publishers. doi: 10.1007/978‑94‑017‑1491‑4
    https://doi.org/10.1007/978-94-017-1491-4 [Google Scholar]
  65. Vu, Thuy , Ai Ti Aw , and Min Zhang
    2008 “Term Extraction through Unithood and Termhood Unification.” InProceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP 2008) 1: 631–636. Hyderabad, India.
    [Google Scholar]
  66. Wong, Wilson , Wei Liu , and Mohammed Bennamoun
    2007 “Tree-traversing Ant Algorithm for Term Clustering based on Featureless Similarities.” Data Mining and Knowledge Discovery15(3): 349–381. doi: 10.1007/s10618‑007‑0073‑y
    https://doi.org/10.1007/s10618-007-0073-y [Google Scholar]
  67. Zheng, Dequan , Tiejun Zhao , and Jing Yang
    2009 “Research on Domain Term Extraction based on Conditional Random Fields.” InInternational Conference on Computer Processing of Oriental Languages:290–296. Heidelberg, Berlin: Springer.
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journals/10.1075/term.00016.vaz
Loading
/content/journals/10.1075/term.00016.vaz
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error