1887
Volume 22, Issue 2
  • ISSN 0929-9971
  • E-ISSN: 1569-9994
USD
Buy:$35.00 + Taxes

Abstract

This paper presents the first results of a new method for terminology extraction based on distributional analysis. The intuition behind the algorithm is that single or multi-word lexical units that refer to specialised concepts will show a characteristic co-occurrence pattern, described as a tendency to appear in the same contexts with other conceptually related terms. E.g. the term will systematically appear in the same sentences with other related terms such as and others. Of course, terms will co-occur with general vocabulary units as well, but not with a characteristic pattern as when a conceptual relation holds. Experimental evaluation of this method was conducted in a corpus of psychiatry journals from Spain and Latin America, and concluded that the results are significantly better than other methods.

Loading

Article metrics loading...

/content/journals/10.1075/term.22.2.01naz
2017-02-10
2019-10-22
Loading full text...

Full text loading...

References

  1. Alfonseca, E. , and S. Manandhar
    2002 “Extending a Lexical Ontology by a Combination of Distributional Semantics Signatures.” InProceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web (EKAW ‘02), ed. by Asunción Gómez-Pérez and V. Richard Benjamins , 1–7. London, UK: Springer-Verlag. doi: 10.1007/3‑540‑45810‑7_1
    https://doi.org/10.1007/3-540-45810-7_1 [Google Scholar]
  2. Altmann, G
    1980 “Prolegomena to Menzerath’s Law.” Glottometrika2: 1–10.
    [Google Scholar]
  3. Ananiadou, S
    1994 “A Methodology for Automatic Term Recognition.” In Proceedings of the 15th International Conference on Computational Linguistics , 1034–1038. Kyoto, Japan. doi: 10.3115/991250.991317
    https://doi.org/10.3115/991250.991317 [Google Scholar]
  4. Anthony, L
    2005 “AntConc: Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom.” In Proceedings of International Professional Communication Conference, (IPCC 2005) , 729–737. 10-13 July 2005, IEEE, Limerick, Ireland.
    [Google Scholar]
  5. Artstein, R. , and M. Poesio
    2008 “Inter-coder Agreement for Computational Linguistics.” Computational Linguistics34(4): 555–596. doi: 10.1162/coli.07‑034‑R2
    https://doi.org/10.1162/coli.07-034-R2 [Google Scholar]
  6. Atserias, J. , B. Casas , E. Comelles , M. González , L. Padró , and M. Padró
    2006 “FreeLing 1.3: Syntactic and Semantic Services in an Open-source NLP Library.” In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006) . 24-26 May 2006, Genoa, Italy.
    [Google Scholar]
  7. Aubin, S. , and T. Hamon
    2006 “Improving Term Extraction with Terminological Resources.” InAdvances in Natural Language Processing: Lecture Notes in Computer Science, ed. by T. Salakoski , F. Ginter , S. Pyysalo , and T. Pahikkala , 380–387. Berlin/Heidelberg: Springer. doi: 10.1007/11816508_39
    https://doi.org/10.1007/11816508_39 [Google Scholar]
  8. Baeza-Yates, R. , and B. Ribeiro-Neto
    1999Modern Information Retrieval. New York: ACM Press.
    [Google Scholar]
  9. Baroni, M. , and A. Lenci
    2010 “Distributional Memory: A General Framework for Corpus-Based Semantics.” Computational Linguistics36(4): 673–721. doi: 10.1162/coli_a_00016
    https://doi.org/10.1162/coli_a_00016 [Google Scholar]
  10. Benavent, P. , and S. Parrilla
    2006 “Análisis de la extracción automática de términos con el programa informático ExtraTerm.” Fòrum de recerca12:1–10.
    [Google Scholar]
  11. Bernier-Colborne, G
    2014 “Identifying Semantic Relations in a Specialized Corpus through Distributional Analysis of a Cooccurrence Tensor.” In Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014) , 57–62. Dublin, Ireland. doi: 10.3115/v1/S14‑1007
    https://doi.org/10.3115/v1/S14-1007 [Google Scholar]
  12. Bertels, A. , and D. Speelman
    2014 “Clustering for Semantic Purposes: Exploration of Semantic Similarity in a Technical Corpus.” Terminology20(2): 279–303. doi: 10.1075/term.20.2.07ber
    https://doi.org/10.1075/term.20.2.07ber [Google Scholar]
  13. Bolshakova, E. , N. Loukachevitch , and M. Nokel
    2013 “Topic Models Can Improve Domain Term Extraction.” InAdvances in Information Retrieval, ed. by Pavel Serdyukov , Pavel Braslavski , Sergei O. Kuznetsov , Jaap Kamps , Stefan Rüger , Eugene Agichtein , Ilya Segalovich , and Emine Yilmaz . Lecture Notes in Computer Science, 684–687. Berlin/Heidelberg: Springer. doi: 10.1007/978‑3‑642‑36973‑5_60
    https://doi.org/10.1007/978-3-642-36973-5_60 [Google Scholar]
  14. Bourigault, D. , I. Gonzales-Mullier , and C. Gros
    1996 “LEXTER, a Natural Language Tool for Terminology Extraction.” InProceedings of the 7th EURALEX Congress , ed. by M. Gellerstam , J. Järborg , S. Malmgren , K. Norén , L. Rogström , and C. Röjder Papmehl , 771–779. Göteborg, Sweden.
    [Google Scholar]
  15. Bourigault, D. , and C. Jacquemin
    1999 “Term Extraction + Term Clustering: An Integrated Platform for Computer-Aided Terminology.” In Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics (EACL ‘99) , 15–22. Association for Computational Linguistics, Stroudsburg, PA, USA. doi: 10.3115/977035.977039
    https://doi.org/10.3115/977035.977039 [Google Scholar]
  16. Budin, G
    2001 “A Critical Evaluation of the State-of-the-art of Terminology Theory.” ITTF Journal12(1-2): 7–23.
    [Google Scholar]
  17. Bullinaria, J.A
    2008 “Semantic Categorization Using Simple Word Co-occurrence Statistics.” InProceedings of the ESSLLI Workshop on Distributional Lexical Semantics, ed. by M. Baroni , S. Evert , and A. Lenci , 1–8. Hamburg, Germany: ESSLLI.
    [Google Scholar]
  18. Bullinaria, J. , and J. Levy
    2007 “Extracting Semantic Representations from Word Co-occurrence Statistics: A Computational Study.” Behavior Research Methods39(3): 510–526. doi: 10.3758/BF03193020
    https://doi.org/10.3758/BF03193020 [Google Scholar]
  19. Cabré, M.T
    1992La terminologia. La teoria, els mètodes, les aplicacions. Barcelona: Empúries.
    [Google Scholar]
  20. Cabré. M.T
    1999La terminologia: representación y comunicación. Barcelona: IULA.
    [Google Scholar]
  21. Cabré, M.T. , R. Estopà , and J. Vivaldi
    2001 “Automatic Term Detection: A Review of Current Systems.” InRecent Advances in Computational Terminology, ed. by D. Bourigault , C. Jacquemin , and M.-C. L’Homme , 53–87. Amsterdam: John Benjamins. doi: 10.1075/nlp.2.04cab
    https://doi.org/10.1075/nlp.2.04cab [Google Scholar]
  22. Conrado, M. , T. Pardo , and S. Rezende
    2013 “A Machine Learning Approach to Automatic Term Extraction using a Rich Feature Set.” In Proceedings of the 2013 NAACL HLT Student Research Workshop , 16–23. Atlanta, US: Association for Computational Linguistics.
    [Google Scholar]
  23. Dagan, I. , and K. Church
    1994 “Termight: Identifying and Translating Technical Terminology.” In Proceedings of the fourth Conference on Applied Natural Language Processing (ANLC ‘94) , 34–40. Stuttgart, Germany. doi: 10.3115/974358.974367
    https://doi.org/10.3115/974358.974367 [Google Scholar]
  24. Daille, B
    1994Approche mixte pour l’extraction automatique de terminologie: statistiques lexicales et filtres linguistiques. Thèse de Doctorat en Informatique Fondamentale. Université Paris 7, Paris.
    [Google Scholar]
  25. Drouin, P
    2003 “Term Extraction Using Non-technical Corpora as a Point of Leverage.” Terminology9(1): 99–117. doi: 10.1075/term.9.1.06dro
    https://doi.org/10.1075/term.9.1.06dro [Google Scholar]
  26. Eco, U
    1975Tratado de semiótica general. Barcelona: Lumen.
    [Google Scholar]
  27. 1979/2000Lector in fabula. Barcelona: Lumen.
    [Google Scholar]
  28. Enguehard, C. , and L. Pantera
    1994 “Automatic Natural Acquisition of a Terminology.” Journal of Quantitative Linguistics2(1): 27–32. doi: 10.1080/09296179508590032
    https://doi.org/10.1080/09296179508590032 [Google Scholar]
  29. Enguehard, C. , B. Daille , and E. Morin
    2002 “Tools for Terminology Processing.” In Proceedings of the Indo-European Conference on Multilingual Communications Technologies (IEMCT) , 218–229. Pune, India.
    [Google Scholar]
  30. Faber, P. , P. León , and J. Prieto
    2009 “Semantic relations, dynamicity and terminological knowledge bases”. Current Issues in Language Studies1(1): 1–23.
    [Google Scholar]
  31. Felber, H
    1984Terminology Manual. Paris: Unesco, Infoterm.
    [Google Scholar]
  32. Firth, J
    1957Papers in Linguistics 1934-1951. London: Oxford University Press.
    [Google Scholar]
  33. Gaussier, E
    2001 “General Considerations on Bilingual Terminology Extraction.” InRecent Advances in Computational Terminology, ed. by D. Bourigault , C. Jacquemin , and M.-C. L’Homme , 167–183. Amsterdam: John Benjamins. doi: 10.1075/nlp.2.09gau
    https://doi.org/10.1075/nlp.2.09gau [Google Scholar]
  34. Heaps, H
    1978Information Retrieval: Computational and Theoretical Aspects. New York: Academic Press.
    [Google Scholar]
  35. Herdan, G
    1964Quantitative Linguistics. Washington: Butterworths.
    [Google Scholar]
  36. Jacquemin, C
    1997Variation terminologique: Reconnaissance et acquisition automatiques de termes et de leurs variantes en corpus. Mémoire d’Habilitation à Diriger des Recherches en informatique fondamentale, Université de Nantes, Nantes.
    [Google Scholar]
  37. Justeson, J. , and S. Katz
    1995 “Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text.” Natural Language Engineering1(1): 9–27. doi: 10.1017/S1351324900000048
    https://doi.org/10.1017/S1351324900000048 [Google Scholar]
  38. Kageura, K. , and B. Umino
    1996 “Methods of Automatic Term Recognition.” Terminology3(2): 259–290. doi: 10.1075/term.3.2.03kag
    https://doi.org/10.1075/term.3.2.03kag [Google Scholar]
  39. Kageura, K
    2002The Dynamics of Terminology: A Descriptive Theory of Term Formation and Terminological Growth. Amsterdam: John Benjamins. doi: 10.1075/tlrp.5
    https://doi.org/10.1075/tlrp.5 [Google Scholar]
  40. 2012The Quantitative Analysis of the Dynamics and Structure of Terminologies. Amsterdam: John Benjamins. doi: 10.1075/tlrp.15
    https://doi.org/10.1075/tlrp.15 [Google Scholar]
  41. Kilgarriff, A. , and D. Tugwell
    2001 “Word Sketch: Extraction and Display of Significant Collocations for Lexicography.” In Proceedings of theACL Workshop on Collocation: Computational Extraction, Analysis and Exploitation, 32–38. Toulouse, France.
    [Google Scholar]
  42. Kilgarriff, A. , and I. Renau
    2013 “esTenTen, a Vast Web Corpus of Peninsular and American Spanish.” Procedia Social and Behavioral Sciences95: 12–19. doi: 10.1016/j.sbspro.2013.10.617
    https://doi.org/10.1016/j.sbspro.2013.10.617 [Google Scholar]
  43. Lavelli, A. , F. Sebastiani , and R. Zanoli
    2004 “Distributional Term representations: An Experimental Comparison.” In Proceedings of the thirteenth ACM International Conference on Information and knowledge management (CIKM ‘04) , 615–624. ACM, New York.
    [Google Scholar]
  44. L’Homme, M.C
    2004La terminologie: principes et techniques. Montréal: Presses Université de Montréal.
    [Google Scholar]
  45. L’Homme, M-C
    2005 “Sur la notion de terme.” Meta: Journal des traducteurs50(4): 1112–1132. doi: 10.7202/012064ar
    https://doi.org/10.7202/012064ar [Google Scholar]
  46. 2015 “Predicative Lexical Units in Terminology.” InRecent Advances in Language Production, ed. by N. Gala , R. Rapp , and G. Bel-Enguix , Cognition and the Lexicon, 75–93. Berlin: Springer.
    [Google Scholar]
  47. Loginova, E. , A. Gojun , H. Blancafort , M. Guegan , T. Gornostay , and U. Heid
    2012 “Reference Lists for the Evaluation of Term Extraction Tools.” In Proceedings of Terminology and Knowledge Engineering (TKE 2012) . Madrid, Spain.
    [Google Scholar]
  48. Lossio-Ventura, J.A. , C. Jonquet , M. Roche , and M. Teisseire
    2014 “Biomedical Terminology Extraction: A New Combination of Statistical, Web Mining Approaches.” In Proceedings of Journées Internationales d’Analyse Statistique Des Données Textuelles (JADT2014) , ed. by E. Née, J-M. Daube , M. Valette , and S. Fleury , 421–432. June 3-6, 2014, Paris, France.
    [Google Scholar]
  49. Lund, K. , C. Burgess , and R. Atchley
    1995 “Semantic and Associative Priming in High-dimensional Semantic Space.” In Proceedings of the 17th Annual Conference of the Cognitive Science Society 17: 660–665. Hillsdale, NJ: Erlbaum.
    [Google Scholar]
  50. Manning, Ch. , P. Raghavan , and H. Schütze
    2008Introduction to Information Retrieval. Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511809071
    https://doi.org/10.1017/CBO9780511809071 [Google Scholar]
  51. Maynard, D. , and S. Ananiadou
    2000 “TRUCKS: A Model for Automatic Term Recognition.” Journal of Natural Language Processing8(1): 101–125. doi: 10.5715/jnlp.8.101
    https://doi.org/10.5715/jnlp.8.101 [Google Scholar]
  52. Navigli, R. , P. Velardi , and S. Faralli
    2011 “A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch.” In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI’11) , 3:1871–1877. July 16-22, 2011. Barcelona, Spain: AAAI Press.
    [Google Scholar]
  53. Nazar, R
    2011 “A Statistical Approach to Term Extraction.” International Journal of English Studies11(2): 153–176.
    [Google Scholar]
  54. Pazienza, M.T. , M. Pennacchiotti , and F.M. Zanzotto
    2005 “Terminology Extraction: An Analysis of Linguistic and Statistical Approaches.” InKnowledge Mining, ed. by S. Sirmakessis , 255–279. Berlin/Heidelberg: Springer. doi: 10.1007/3‑540‑32394‑5_20
    https://doi.org/10.1007/3-540-32394-5_20 [Google Scholar]
  55. Pantel, P. , and D. Lin
    2001 “A Statistical Corpus-Based Term Extractor.” In Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence , 36–46. London, UK.
    [Google Scholar]
  56. Patry, A. , and P. Langlais
    2005 “Corpus-Based Terminology Extraction.” In 7th International Terminology and Knowledge Engineering Conference (TKE 2005) , 313–321. Copenhagen, Danemark.
    [Google Scholar]
  57. Périnet, A. , and T. Hamon
    2014 “Generalising and Normalising Distributional Contexts to Reduce Data Sparsity: Application to Medical Corpora.” In Proceedings of the 4th International Workshop on Computational Terminology , 1–10. Dublin, Ireland.
    [Google Scholar]
  58. Porter, M
    1980 “An Algorithm for Suffix Stripping.” Program14(3): 130–137. doi: 10.1108/eb046814
    https://doi.org/10.1108/eb046814 [Google Scholar]
  59. Oliver, T. , and M. Vàzquez
    2007 “A Free Terminology Extraction Suite.” In Proceedings of theTwenty-ninth International Conference on Translating and the Computer, 29–30. November 2007, London.
    [Google Scholar]
  60. Rey, A
    1979/1992 “Noms et notions: la terminologie” Que sais-je?Paris: Presses universitaires de France.
    [Google Scholar]
  61. 1982 “Encyclopédies et dictionnaires” Que sais-je?Paris: Presses universitaires de France.
    [Google Scholar]
  62. Sager, J.C
    1990A Practical Course in Terminology Processing. Amsterdam: John Benjamins. doi: 10.1075/z.44
    https://doi.org/10.1075/z.44 [Google Scholar]
  63. Schmid, H
    1994 “Probabilistic Part-of-Speech Tagging Using Decision Trees.” In Proceedings of International Conference on New Methods in Language Processing , 44–49. Manchester, UK.
    [Google Scholar]
  64. Scott, M
    1997 “PC Analysis of Key Words and Key Key Words.” System25(2): 233–245. doi: 10.1016/S0346‑251X(97)00011‑0
    https://doi.org/10.1016/S0346-251X(97)00011-0 [Google Scholar]
  65. Spärck Jones, K
    1972 “A Statistical Interpretation of Term Specificity and its Application in Retrieval.” Journal of Documentation28(1): 11–21. doi: 10.1108/eb026526
    https://doi.org/10.1108/eb026526 [Google Scholar]
  66. Swales, J
    2011Aspects of Article Introductions. Ann Arbor: University of Michigan Press.
    [Google Scholar]
  67. Temmerman, R
    2000Towards New Ways of Terminological Description. The Sociocognitive Approach. Amsterdam: John Benjamins. doi: 10.1075/tlrp.3
    https://doi.org/10.1075/tlrp.3 [Google Scholar]
  68. Turney, P. , and P. Pantel
    2010 “From Frequency to Meaning: Vector Space Models of Semantics.” Journal of Artificial Intelligence Research37: 141–188.
    [Google Scholar]
  69. Vargas-Sierra, C
    2014 “Estudio contrastivo inglés-español de combinatoria especializada.” Paper presented at XIV Simposio Iberoamericano de Terminología (RITerm 2014) . Santiago, Chile.
    [Google Scholar]
  70. Vignaux, G
    1976L’argumentation. Essai d’une logique discursive. Genève: Droz.
    [Google Scholar]
  71. Vivaldi, J
    2001Extracción de candidatos a término mediante combinación de estrategias heterogéneas. PhD thesis, Universitat Pompeu Fabra, Barcelona
    [Google Scholar]
  72. Vivaldi, J. , and H. Rodríguez
    2011 “Extracting Terminology from Wikipedia.” Procesamiento del lenguaje natural47: 65–73.
    [Google Scholar]
  73. Wüster, E
    1979Introduction to the General Theory of Terminology and Terminological Lexicography. Wien: Springer.
    [Google Scholar]
  74. Zadeh, B. , and S. Handschuh
    2014 “Evaluation of Technology Term Recognition with Random Indexing.” In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) , 4027–2032. May 26-31, 2014. Reykjavik, Iceland.
    [Google Scholar]
  75. Zhang, Z. , J. Iria , C. Brewster , and F. Ciravegna
    2008 “A Comparative Evaluation of Term Recognition Algorithms.” In Proceedings of TheSixth International Conference on Language Resources and Evaluation, (LREC 2008), 2108–2113. Marrakech, Morocco.
    [Google Scholar]
  76. Zipf, G.K
    1949Human Behaviour and the Principle of Least-Effort. Cambridge, MA: Addison-Wesley.
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journals/10.1075/term.22.2.01naz
Loading
  • Article Type: Research Article
Keyword(s): co-occurrence , distributional semantics , terminology extraction , text-mining and topic signatures
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error