1887
Volume 24, Issue 1
  • ISSN 0929-9971
  • E-ISSN: 1569-9994
USD
Buy:$35.00 + Taxes

Abstract

The present article explores two novel methods that integrate distributed representations with terminology extraction. Both methods assess the specificity of a word (unigram) to the target corpus by leveraging its distributed representation in the target domain as well as in the general domain. The first approach adopts this distributed specificity as a filter, and the second directly applies it to the corpus. The filter can be mounted on any other Automatic Terminology Extraction (ATE) method, allows merging any number of other ATE methods, and achieves remarkable results with minimal training. The direct approach does not perform as high as the filtering approach, but it reemphasizes that using distributed specificity as the words’ representation, very little data is required to train an ATE classifier. This encourages more minimally supervised ATE algorithms in the future.

Loading

Article metrics loading...

/content/journals/10.1075/term.00012.amj
2018-05-31
2019-08-21
Loading full text...

Full text loading...

References

  1. Anthony, Laurence
    2012AntConc (Version 3.3.0) [Computer Software]. Tokyo, Japan: Waseda University (www.laurenceanthony.net/). Accessed12 February 2018.
    [Google Scholar]
  2. Piotr Bojanowski , Edouard Grave , Armand Joulin , and Tomas Mikolov
    2017 “Enriching Word Vectors with Subword Information.” Transactions of the Association for Computational Linguistics (TACL) 5: 135–147.
    [Google Scholar]
  3. Broß, Jurgen , and Heiko Ehrig
    2013 “Terminology Extraction Approaches for Product Aspect Detection in Customer Reviews.” InProceedings of the Seventeenth Conference on Computational Natural Language Learning, CoNLL 2013, ed. by Julia Hockenmaier and Sebastian Riedel , 222–230, Vancouver, BC, Canada.
    [Google Scholar]
  4. Cabré-Castellvi, Maria Teresa , Rosa Estopa Bagot , and Jordi Vivaldi-Palatresi
    2001 “Automatic Term Detection: A Review of Current Systems.” InRecent Advances in Computational Terminology, ed. by D. Bourigault , C. Jacquemin , and M. C. L’Homme , 53–87, Amsterdam/Philadephia: John Benjamins.10.1075/nlp.2.04cab
    https://doi.org/10.1075/nlp.2.04cab [Google Scholar]
  5. Chung, Teresa Mihwa
    2003 “A Corpus Comparison Approach for Terminology Extraction.” Terminology9(2): 221–246. doi: 10.1075/nlp.2.04cab
    https://doi.org/10.1075/nlp.2.04cab [Google Scholar]
  6. Chung, Teresa Mihwa , and Paul Nation
    2004 “Identifying Technical Vocabulary.” System32(2): 251–263.
    [Google Scholar]
  7. Conrado, Merley , Thiago Pardo , and Solange Rezende
    2013 “A Machine Learning Approach to Automatic Term Extraction Using a Rich Feature Set”. InProceedings of the NAACL HLT 2013 Student Research Workshop, 16–23, Atlanta, GA.
    [Google Scholar]
  8. Crippin, Peter , Robert Donato , and David Wright
    2007Calculus and Vectors. Toronto, ON, Canada: Nelson Education Limited.
    [Google Scholar]
  9. Drouin, Patrick
    2003 “Term Extraction Using Non-Technical Corpora as a Point of Leverage”. Terminology, 9(1): 99–115. doi: 10.1075/term.9.1.06dro
    https://doi.org/10.1075/term.9.1.06dro [Google Scholar]
  10. Frantzi, Katerina T. , Sophia Ananiadou , and Jun-ichi Tsujii
    1998 “The c-value/nc-value Method of Automatic Recognition for Multi-word Terms”. InProceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries, ECDL’98, 585–604, London, UK: Springer-Verlag. doi: 10.1007/3‑540‑49653‑X_35
    https://doi.org/10.1007/3-540-49653-X_35 [Google Scholar]
  11. Inkpen, Diana , T. Sima Paribakht , Farahnaz Faez , and Ehsan Amjadian
    2016 “Term Evaluator: A Tool for Terminology Annotation and Evaluation”. International Journal of Computational Linguistics and Applications (7) 2: 145–165.
    [Google Scholar]
  12. Ismail, Azniah , and Suresh Manandhar
    2010 “Bilingual Lexicon Extraction from Comparable Corpora Using in Domain Terms.” InProceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING ’10, 481–489, Stroudsburg, PA.
    [Google Scholar]
  13. Kageura, Kyo , and Bin Umino
    1996 “Methods of Automatic Term Recognition: A Review.” Terminology3(2): 259–289. doi: 10.1075/term.3.2.03kag
    https://doi.org/10.1075/term.3.2.03kag [Google Scholar]
  14. Kirkpatrick, Chris , Barbara Alldred , Crystal Chilvers , Beverly Farahani , Kristina Farentino , Angelo Lillo , Ian Macpherson , John Rodger , and Susanne Trew
    2007Nelson Advanced Functions. Toronto, ON, Canada: Nelson Education.
    [Google Scholar]
  15. Le Serrec, Annaïch , Marie-Claude L’Homme , Patrick Drouin , and Olivier Kraif
    2010 “Automating the Compilation of Specialized Dictionaries Use and Analysis of Term Extraction and Lexical Alignment.” Terminology16 (1): 77–107. doi: 10.1075/term.16.1.04les
    https://doi.org/10.1075/term.16.1.04les [Google Scholar]
  16. Ljubesic, Nikola , Spela Vintar , and Darja Fiser
    2012 “Multi-word Term Extraction from Comparable Corpora by Combining Contextual and Constituent Clues”. InProceedings of 5th Workshop on Building and Using Comparable Corpora (BUCC 2012), 143–147, Istanbul, Turkey.
    [Google Scholar]
  17. Mikolov, Thomas , Kai Chen , Greg Corrado , and Jeffrey Dean
    2013 “Efficient Estimation of Word Representations in Vector Space.” In arXiv preprint arXiv:1301.3781 (https://arxiv.org/pdf/1301.3781.pdf). Accessed10 February 2018.
  18. Mitkov, Ruslan , Richard Evans , Constantin Orasan , Iustin Dornescu , and Miguel Rios
    2012 “Coreference Resolution: To What Extent Does It Help NLP Applications?”. InText, Speech and Dialogue. TSD 2012. Lecture Notes in Computer Science, vol.7499, 179–190. Berlin, Heidelberg: Springer.
    [Google Scholar]
  19. Mnih, Andriy , and Koray Kavukcuoglu
    2013 “Learning Word Embeddings Efficiently with Noise-contrastive Estimation.” InAdvances in Neural Information Processing Systems, ed. by C. J. C. Burges , L. Bottou , M. Welling , Z. Ghahramani , and K. Q. Weinberger , 26: 2265–2273. Red Hook, NY, USA: Curran Associates, Inc.
    [Google Scholar]
  20. Nazar, Rogelio , and Maria Teresa Cabré
    2012 “A Machine Learning Approach to Automatic Term Extraction Using a Rich Feature Set.” InProceedings of the 10th Terminology and Knowledge Engineering Conference, 209–217, Madrid, Spain.
    [Google Scholar]
  21. Park, Youngja , Roy J. Byrd , and Branimir K. Boguraev
    2002 “Automatic Glossary Extraction: Beyond Terminology Identification.” InProceedings of the 19th International Conference on Computational Linguistics, 1–7, Morristown, NJ. doi: 10.3115/1072228.1072370
    https://doi.org/10.3115/1072228.1072370 [Google Scholar]
  22. Pennington, Jeffrey , Richard Socher , and Christopher D. Manning
    2014 “Glove: Global Vectors for Word Representation. InEmpirical Methods in Natural Language Processing (EMNLP 2014), 1532–1543, Doha, Qatar. doi: 10.3115/v1/D14‑1162
    https://doi.org/10.3115/v1/D14-1162 [Google Scholar]
  23. Platt, John
    1998 “Fast Training of Support Vector Machines using Sequential Minimal Optimization.” InAdvances in Kernel Methods – Support Vector Learning, ed. by B. Schoelkopf , C. Burges , and A. Smola , 41–64, Cambridge: MIT Press.
    [Google Scholar]
  24. Pontiki, Maria , Dimitris Galanis , John Pavlopoulos , Harris Papageorgiou , Ion Androutsopoulos , and Suresh Manandhar
    2014 “Semeval-2014 Task 4: Aspect-based Sentiment Analysis.” InProceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), 27–35, Dublin, Ireland. doi: 10.3115/v1/S14‑2004
    https://doi.org/10.3115/v1/S14-2004 [Google Scholar]
  25. Pontiki, Maria , Dimitris Galanis , Haris Papageorgiou , Suresh Manandhar , and Ion Androutsopoulos
    2015 “Semeval-2015 Task 12: Aspect-based Sentiment Analysis.” InProceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), 486–495, Denver, Colorado. doi: 10.18653/v1/S15‑2082
    https://doi.org/10.18653/v1/S15-2082 [Google Scholar]
  26. Rehurek, Radim and Petr Sojka
    2010 “Software Framework for Topic Modelling with Large Corpora.” InProceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, 45–50, Valletta, Malta.
    [Google Scholar]
  27. Small, Marian , Chris Kirkpatrick , B. Alldred , S. Godin , Angelo Lillo , and Andrew Dmytriw
    2007a “Functions 11”. Toronto, ON, Canada: Nelson Education Limited.
    [Google Scholar]
  28. Small, Marian , Chris Kirkpatrick , and Andrew Dmytriw
    2007bFunctions and Applications 11. Nelson Education Limited.Small, Marian, C. Kirkpatrick , D. Zimmer , C. Chilvers , S. DAgostino , D. Duff , K. Farentino , I. Macpherson , J. Tonner , J. Williamson , and T. A. Yeager 2005 Principles of Mathematics 9. Toronto, ON, Canada; Nelson Education Limited.
    [Google Scholar]
  29. Su Nam, Kim , Timothy Baldwin , and Min-Yen Kan
    2009 “An Unsupervised Approach to Domain-Specific Term Extraction.” InProceedings of the Australasian Language Technology Association Workshop 2009, 94–99, Sydney, Australia.
    [Google Scholar]
  30. Turney, Peter D.
    2000 ”Learning Algorithms for Keyphrase Extraction.” Information Retrieval2(4): 303–336. doi: 10.1023/A:1009976227802
    https://doi.org/10.1023/A:1009976227802 [Google Scholar]
  31. Vintar, Spela
    2010 “Bilingual Term Recognition Revisited: The Bag-of-equivalents Term Alignment Approach and its Evaluation”. Terminology16(2): 141–158. doi: 10.1075/term.16.2.01vin
    https://doi.org/10.1075/term.16.2.01vin [Google Scholar]
  32. Vu, Thuy , Ai Ti Aw , and Min Zhang
    2008 “Term Extraction through Unithood and Termhood Unification.” InProceedings of the International Joint Conference on Natural Language Processing, 631–636, Hyderabad, India.
    [Google Scholar]
  33. Wang, Rui , Wei Liu , and Chris McDonald
    2015 “Corpus-independent Generic Keyphrase Extraction Using Word Embedding Vectors.” InProceedings of the Workshop on Deep Learning for Web Search and Data Mining. 1–8, Shanghai, China.
    [Google Scholar]
  34. Yang, Yuhang , Hao Yu , Yao Meng , Yingliang Lu , and Yingju Xia
    2010 “Fault-tolerant Learning for Term Extraction.” InProceedings of the 24th Pacific Asia Conference on Language, Information and Computation ( PACLIC 2010 ), ed. by Ryo Otoguro , Kiyoshi Ishikawa , Hiroshi Umemoto , Kei Yoshimoto , and Yasunari Harada , 321–330, Sendai, Japan
    [Google Scholar]
  35. Yin, Yichun , Furu Wei , Li Dong , Kaimeng Xu , Ming Zhang , and Ming Zhou
    2016 “Unsupervised Word and Dependency Path Embeddings for Aspect Term Extraction.” InProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16). 2979–2985, New York, NY.
    [Google Scholar]
  36. Yoshida, Minoru , and Hiroshi Nakagawa
    2005 “Automatic Term Extraction Based on Perplexity of Compound Words” InProceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP-05), 269–279, Jeju Island, Korea.
    [Google Scholar]
  37. Zervanou, Kalliopi
    2010 “The Uvt Term Extraction System in the Keyphrase Extraction Task.” InProceedings of the 5th International Workshop on Semantic Evaluation, 194–197, Uppsala, Sweden.
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journals/10.1075/term.00012.amj
Loading
/content/journals/10.1075/term.00012.amj
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error