1887
image of Utilising heterogeneous language resources for term extraction in maritime domains
USD
Buy:$35.00 + Taxes

Abstract

Abstract

The development of terminologies for domains where these are lacking is a time-consuming and costly task. This article takes a methodological perspective and addresses a general methodological question: how can we, with limited funding, utilise to a maximal degree, existing language resources to create a terminology at a relatively low cost? Although an important player in the maritime industries for many centuries, Norway has not prioritised the systematic development of an official maritime terminology. The article therefore focuses specifically on efforts to develop a national resource for maritime domains. The article describes efforts to create a corpus of popular science and a parallel corpus of technical texts. Six different term extraction methods are applied. These include corpus-based statistical analyses of frequency, collocation and keyness, as well as bilingual term extraction. Finally, the pros and cons of each method are evaluated by means of a cost-benefit analysis.

Loading

Article metrics loading...

/content/journals/10.1075/term.20024.and
2021-09-10
2021-12-04
Loading full text...

Full text loading...

References

  1. Ahmad, Khurshid, and Margaret A. Rogers
    2001 “Corpus linguistics and terminology extraction.” InHandbook of Terminology Management (Volume2), ed. bySue-Ellen Wright and Gerhard Budin, 725–760. Amsterdam: John Benjamins. 10.1075/z.htm2.28ahm
    https://doi.org/10.1075/z.htm2.28ahm [Google Scholar]
  2. Ahmad, Khurshid, Andrea E. Davies, Heather Fulford, and Margaret A. Rogers
    1994 “What is a term? The semi-automatic extraction of terms from text.” InTranslation Studies – An Interdiscipline, ed. byMary Snell-Hornby, Franz Pöchhacker and Klaus Kaindl, 267–278. 10.1075/btl.2.33ahm
    https://doi.org/10.1075/btl.2.33ahm [Google Scholar]
  3. Austlid, Einar
    1971Norsk-engelsk ordliste for fiskarar [Norwegian-English dictionary for fishermen]. Oslo: Reenskaugs forlag.
    [Google Scholar]
  4. Andersen, Gisle
    2008 “Quantifying domain-specificity: the occurrence of financial terms in a general corpus.” SYNAPS21: 37–52.
    [Google Scholar]
  5. (ed.) 2012Exploring Newspaper Language – Using the web to create and investigate a large corpus of modern Norwegian. Amsterdam: John Benjamins. 10.1075/scl.49
    https://doi.org/10.1075/scl.49 [Google Scholar]
  6. 2016 “Using the corpus-driven method to chart discourse-pragmatic change.” InDiscourse-pragmatic variation and change in English: New methods and insights, ed. byHeike Pichler, 21–40. Cambridge: Cambridge University Press. 10.1017/CBO9781107295476.002
    https://doi.org/10.1017/CBO9781107295476.002 [Google Scholar]
  7. Andersen, Gisle, Peder Gammeltoft, and Kjetil Gundersen
    . In preparation. Termportalen – frå forprosjekt til fast finansiering [The terminology Portal – from pilot project to permanent funding]. To be published inNordterm.
    [Google Scholar]
  8. Andersen, Gisle, and Knut Hofland
    2012 “Building a large corpus based on newspapers from the web.” InExploring Newspaper Language, ed. byGisle Andersen, 1–28. Amsterdam: John Benjamins. 10.1075/scl.49.01and
    https://doi.org/10.1075/scl.49.01and [Google Scholar]
  9. Andersen, Gisle, and Marita Kristiansen
    2013 “Towards a national portal for Norwegian terminology in the CLARINO project.” Terminologen2:188–189.
    [Google Scholar]
  10. 2015 “Termportalen som infrastruktur for terminologi i Norge.” Terminologen5: 53–60.
    [Google Scholar]
  11. Lyse, Gunn Inger, and Gisle Andersen
    2012 “Collocations and statistical analysis of n-grams: Multiword expressions in newspaper text.” InExploring Newspaper Language, ed. byGisle Andersen, 79–109, Amsterdam: John Benjamins. 10.1075/scl.49.05lys
    https://doi.org/10.1075/scl.49.05lys [Google Scholar]
  12. Bondi, Marina
    2010 “Perspectives on keywords and keyness: An introduction.” InKeyness in Texts, ed. byMarina Bondi, and Mike Scott. Amsterdam, John Benjamins, 1–18. 10.1075/scl.41.01bon
    https://doi.org/10.1075/scl.41.01bon [Google Scholar]
  13. Bourigault, Didier
    1992 “Surface grammatical analysis for the extraction of terminological noun phrases.” InCOLING ’92: Proceedings of the Fourteenth International conference on Computational Linguistics, 977–981. Nantes: ICC. 10.3115/992383.992415
    https://doi.org/10.3115/992383.992415 [Google Scholar]
  14. 1994 LEXTER, un Logiciel d’Extraction de Terminologie: Application à l’acquisition de connaissances à partir de textes. PhD Thesis, École des Hautes Études en Sciences Sociales, Paris.
    [Google Scholar]
  15. Brekke, Magnar, Kai Innselset, Marita Kristiansen, and Kari Øvsthus
    2006 “KB-N: Automatic term extraction from a knowledge-bank of economics.” InProceedings from LRECC 2006, 1912–1915, www.lrec-conf.org/proceedings/lrec2006/pdf/807_pdf.pdf
    [Google Scholar]
  16. Cabré, M. Teresa
    2003 “Theories of terminology: Their description, prescription and explanation.” Terminology9(2): 163–199. 10.1075/term.9.2.03cab
    https://doi.org/10.1075/term.9.2.03cab [Google Scholar]
  17. Cabré, M. Teresa, María Estopa, Rosa Bagot, and Jordi Palatresi
    2001 “Automatic term detection: A review of current systems.” InRecent advances in computational terminology, ed. byDidier Bourigault, Christian Jacquemin, and Marie-Claude L’Homme, 53–88. Amsterdam: John Benjamins. 10.1075/nlp.2.04cab
    https://doi.org/10.1075/nlp.2.04cab [Google Scholar]
  18. Cabré, M. Teresa
    1999Terminology: Theory, methods and applications. Amsterdam: John Benjamins. 10.1075/tlrp.1
    https://doi.org/10.1075/tlrp.1 [Google Scholar]
  19. Drouin, Patrick, Jean-Benoît Morel, and Marie-Claude L’Homme
    2020 “Automatic Term Extraction from Newspaper Corpora: Making the Most of Specificity and Common Features.” Proceedings of the 6th International Workshop on Computational Terminology (COMPUTERM 2020), 1–7.
    [Google Scholar]
  20. Foo, Jody, and Magnus Merkel
    2010 “Computer aided term bank creation and standardization: Building standardized term banks through automated term extraction and advanced editing tools”. InTerminology in Everyday Life, ed. byMarcel Thelen and Frieda Steurs, 163–180. Amsterdam: John Benjamins. 10.1075/tlrp.13.12foo
    https://doi.org/10.1075/tlrp.13.12foo [Google Scholar]
  21. Fulford, Heather
    2001 “Exploring terms and their linguistic environment: A domain-independent approach to automated term extraction.” Terminology7(2): 259–279. 10.1075/term.7.2.08ful
    https://doi.org/10.1075/term.7.2.08ful [Google Scholar]
  22. Heid, Ulrich
    2006 “Extracting term candidates from recursively chunked text.” InTerminology, computing and translation, ed. byPius ten Hacken, 97–115. Tübingen: Gunter Narr.
    [Google Scholar]
  23. Hiemstra, Djoerd
    1998 “Multilingual Domain Modeling in Twenty-One. Automatic Creation of a Bi-directional Translation Lexicon from a Parallel Corpus.” InProceedings of the 8th CLIN meeting, ed. byP. H. Coppen, L. van Halsteren, and L. Teunissen, 41–58. Amsterdam: Rodopi.
    [Google Scholar]
  24. Hofland, Knut, and Øystein Reigem
    2006 Translation Corpus Aligner, version 2. An interactive sentence aligner. Paper presented atICAME. korpus.uib.no/icame/tca2/tca2-abstract.htm
    [Google Scholar]
  25. Hofland, Knut, and Stig Johansson
    1998 “The Translation Corpus Aligner: A program for automatic alignment of parallel texts.” InCorpora and Cross-linguistic Research: Theory, Method, and Case Studies, ed. byIn Stig Johansson, and Signe Oksefjell, 87–100. Amsterdam: Rodopi.
    [Google Scholar]
  26. Kageura, Kyo, and Elizabeth Marshman
    2019 “Terminology Extraction and Management.” InThe Routledge Handbook of Translation and Technology, ed. byMinako O’Hagan, 61–77. London: Routledge. 10.4324/9781315311258‑4
    https://doi.org/10.4324/9781315311258-4 [Google Scholar]
  27. Kageura, Kyo, and Bin Umino
    1996 “Methods of automatic term recognition.” Terminology, 3(2), 259–289. 10.1075/term.3.2.03kag
    https://doi.org/10.1075/term.3.2.03kag [Google Scholar]
  28. Kolstad, Ellinor
    2006 “Skjær i sjøen under oversettelse av romanen Trawler” [Stumbling blocks in the translation of the novel Trawler]. Språknytt 2006 (2): 19–23.
    [Google Scholar]
  29. Kristiansen, Marita, and Magnar Brekke
    2004 “Kunnskapsbank for norsk økonomisk- administrative fagdomene.” Språk og språkundervisning1.
    [Google Scholar]
  30. Macken, Lieve, Els Lefever, and Veronique Hoste
    2013 “TExSIS: Bilingual terminology extraction from parallel corpora using chunk-based alignment.” Terminology, 19(1), 1–30. 10.1075/term.19.1.01mac
    https://doi.org/10.1075/term.19.1.01mac [Google Scholar]
  31. McEnery, Tony, and Andrew Hardie
    2012Corpus linguistics. Cambridge: Cambridge University Press.
    [Google Scholar]
  32. Musacchio, M. Teresa
    2017Translating popular science. Padova: CLEUP.
    [Google Scholar]
  33. Myking, Johan
    2005 “Terminologi i Noreg – historisk oversyn” [Terminology in Norway – an historical overview]. InHvem tar ansvaret for fagterminologien?, ed. byJan Hoel, 2–15. Oslo: Språkrådet.
    [Google Scholar]
  34. 2006 Nyare terminologiarbeid i Noreg. Språknytt 2006 (2): 13–18.
    [Google Scholar]
  35. Nazarenko, Adeline, and Haifa Zargayouna
    2009 “Evaluating term extraction.” International Conference Recent Advances in Natural Language Processing (RANLP’09). Borovets, Bulgaria. 299–304. https://hal.archives-ouvertes.fr/hal-00517090/
    [Google Scholar]
  36. Pettersen, Jan Martin
    1997Go fishing! Engelsk for fiskere, havbrukere og fisketilvirkere. [Go fishing! English for fishermen, sea farmers and fish product manufacturers.] Oslo: Landbruksforlaget.
    [Google Scholar]
  37. Rayson, Paul, and Roger Garside
    2000 “Comparing corpora using frequency profiling.” InProceedings of the workshop on Comparing Corpora, held in conjunction with the 38th annual meeting of the Association for Computational Linguistics (ACL 2000), 1–6.
    [Google Scholar]
  38. Rayson, Paul, Geoffrey Leech, and Mary Hodges
    1997 “Social differentiation in the use of English vocabulary: some analyses of the conversational component of the British National Corpus.” International Journal of Corpus Linguistics2 (1):133–52. 10.1075/ijcl.2.1.07ray
    https://doi.org/10.1075/ijcl.2.1.07ray [Google Scholar]
  39. Rigouts Terryn, Ayla, Patrick Drouin, Veronique Hoste, and Els Lefever
    2020 “TermEval 2020: Shared Task on Automatic Term Extraction Using the Annotated Corpora for Term Extraction Research (ACTER) Dataset.” Proceedings of the LREC 2020 6th International Workshop on Computational Terminology (COMPUTERM 2020), 85–94.
    [Google Scholar]
  40. Rigouts Terryn, Ayla, Veronique Hoste, and Els Lefever
    2019 “In No Uncertain Terms: A Dataset for Monolingual and Multilingual Automatic Term Extraction from Comparable Corpora.” Language Resources and Evaluation, 54(2), 385–418. 10.1007/s10579‑019‑09453‑9
    https://doi.org/10.1007/s10579-019-09453-9 [Google Scholar]
  41. Sinclair, John, Susan Jones, Robert Daley, and Ramesh Krishnamurthy
    2004English collocational studies: The OSTI report. London: Continuum.
    [Google Scholar]
  42. Solberg, Marte
    1995 A dictionary and terminological analysis of merchant ship terms. Unpublished Master thesis, NHH.
    [Google Scholar]
  43. Stubbs, Michael
    2001Words and phrases: Corpus studies of lexical semantics. Oxford: Blackwell.
    [Google Scholar]
  44. Vintar, Špela
    2010 “Bilingual Term Recognition Revisited.” Terminology, 16(2), 141–158. 10.1075/term.16.2.01vin
    https://doi.org/10.1075/term.16.2.01vin [Google Scholar]
http://instance.metastore.ingenta.com/content/journals/10.1075/term.20024.and
Loading
/content/journals/10.1075/term.20024.and
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error