Volume 24, Issue 2
  • ISSN 0929-9971
  • E-ISSN: 1569-9994
Buy:$35.00 + Taxes



This article examines the status of constructed controlled terminologies from the perspective of the coverage of terms/concepts. To facilitate controlled authoring of Japanese texts of the municipal domain and promote machine translatability into English, we constructed terminologies in the following way: (1) Japanese-English term pairs are extracted from aligned texts; (2) term variations are controlled by defining preferred and proscribed terms for both languages. To assess the coverage of the constructed terminologies, we propose a quantitative extrapolation method that estimates the potential vocabulary size. The coverage estimations show that the coverage of for Japanese is higher than that for English by about 10%, which reflects the greater diversity of the translated English terms. The coverage of reaches around 60% for both Japanese and English. The method also enables us to quantitatively estimate how much effort is needed to further increase the coverage.


Article metrics loading...

Loading full text...

Full text loading...


  1. Ahmad, Khurshid, and Margaret Rogers
    2001 “Corpus Linguistics and Terminology Extraction.” InHandbook of Terminology Management, vol.2, ed. bySue Ellen Wright, and Gerhard Budin, 725–760. Amsterdam: John Benjamins. 10.1075/z.htm2.28ahm
    https://doi.org/10.1075/z.htm2.28ahm [Google Scholar]
  2. Baayen, Harald
    2001Word Frequency Distributions. Dordrecht: Kluwer Academic Publishers. 10.1007/978‑94‑010‑0844‑0
    https://doi.org/10.1007/978-94-010-0844-0 [Google Scholar]
  3. 2008Analyzing Linguistic Data: A Practical Introduction to Statistics using R. Cambridge: Cambridge University Press. 10.1017/CBO9780511801686
    https://doi.org/10.1017/CBO9780511801686 [Google Scholar]
  4. Baroni, Marco, and Silvia Bernardini
    (eds) 2006Wacky! Working papers on the Web as Corpus. Bologna: Gedit.
    [Google Scholar]
  5. Biber, Douglas
    1993 “Representativeness in Corpus Design.” Literary and Linguistic Computing8 (4): 243–257. 10.1093/llc/8.4.243
    https://doi.org/10.1093/llc/8.4.243 [Google Scholar]
  6. Carl, Michael, Ecaterina Rascu, Johann Haller, and Philippe Langlais
    2004 “Abducing Term Variant Translations in Aligned Texts.” Terminology10 (1): 101–130. 10.1075/term.10.1.06car
    https://doi.org/10.1075/term.10.1.06car [Google Scholar]
  7. Carroll, John B.
    1969 “A Rationale for an Asymptotic Lognormal Form of Word-Frequency Distributions.” Research Bulletin. Princeton, New Jersey: Educational Testing Service.
    [Google Scholar]
  8. Daille, Béatrice
    1996 “Study and Implementation of Combined Techniques for Automatic Extraction of Terminology.” InThe Balancing Act: Combining Symbolic and Statistical Approaches to Language, ed. byPhilip Resnik, and Judith L. Klavans, 49–66. Cambridge: MIT Press.
    [Google Scholar]
  9. Daille, Beatrice
    2003 “Conceptual Structuring Through Term Variations.” InProceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment (MWE), 9–16, Sapporo, Japan. 10.3115/1119282.1119284
    https://doi.org/10.3115/1119282.1119284 [Google Scholar]
  10. Daille, Béatrice
    2005 “Variations and Application-Oriented Terminology Engineering.” Terminology11 (1): 181–197. 10.1075/term.11.1.08dai
    https://doi.org/10.1075/term.11.1.08dai [Google Scholar]
  11. 2017Term Variation in Specialised Corpora. Amsterdam: John Benjamins. 10.1075/tlrp.19
    https://doi.org/10.1075/tlrp.19 [Google Scholar]
  12. Daille, Béatrice, Eric Gaussier, and Jean-Marc Langé
    1994 “Towards Automatic Extraction of Monolingual and Bilingual Terminology.” InProceedings of the 15th International Conference on Computational Linguistics (COLING), 515–521, Kyoto, Japan. 10.3115/991886.991975
    https://doi.org/10.3115/991886.991975 [Google Scholar]
  13. Daille, Béatrice, Benoît Habert, Christian Jacquemin, and Jean Royauté
    1996 “Empirical Observation of Term Variations and Principles for Their Description.” Terminology3 (2): 197–257. 10.1075/term.3.2.02dai
    https://doi.org/10.1075/term.3.2.02dai [Google Scholar]
  14. Damerau, Fred J.
    1990 “Evaluating Computer-generated Domain-oriented Vocabularies.” Information Processing & Management26 (6): 791–801. 10.1016/0306‑4573(90)90052‑4
    https://doi.org/10.1016/0306-4573(90)90052-4 [Google Scholar]
  15. Désilets, Alain, Louis-Phillippe Huberdeau, Marc Laporte, and Jean Quirion
    2009 “Building a Collaborative Multilingual Terminology System.” InProceedings of the 31st Conference of Translating and the Computer, London, UK.
    [Google Scholar]
  16. Dillinger, Mike
    2001 “Dictionary Development Workflow for MT: Design and Management.” InProceedings of the Machine Translation Summit VIII, 83–88, Galicia, Spain.
    [Google Scholar]
  17. Efron, Bradley, and Ronald Thisted
    1976 “Estimating the Number of Unseen Species: How Many Words Did Shakespeare Know?” Biometrika63 (3): 435–447.
    [Google Scholar]
  18. Evert, Stefan
    2004 “A Simple LNRE Model for Random Character Sequences.” InProceedings of the 7es Journées internationales d’Analyse statistique des Données Textuelles (JADT), 411–422, Louvain-la-Neuve, France.
    [Google Scholar]
  19. Evert, Stefan, and Marco Baroni
    2005 “Testing the Extrapolation Quality of Word Frequency Models.” InProceedings of the Corpus Linguistics 2005, Birmingham, UK.
    [Google Scholar]
  20. 2007 “zipfR: Word Frequency Distributions in R.” InProceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL), Posters and Demonstrations Session, 29–32, Prague, Czech Republic.
    [Google Scholar]
  21. Fischer, Márta
    2010 “Language (Policy), Translation and Terminology in the European Union.” InTerminology and Lexicography Research and Practice: Terminology in Everyday Life, ed. byMarcel Thelen, and Frieda Steurs, 13: 21–34. Amsterdam: John Benjamins. 10.1075/tlrp.13.03fis
    https://doi.org/10.1075/tlrp.13.03fis [Google Scholar]
  22. Fletcher, William
    2004 “Making the Web More Useful as a Source for Linguistic Corpora.” InApplied Corpus Linguistics: A Multidimensional Perspective, ed. byUlla Connor, and Thomas Upton, 191–205. Amsterdam: Rodopi. 10.1163/9789004333772_011
    https://doi.org/10.1163/9789004333772_011 [Google Scholar]
  23. Foo, Jody
    2012Computational Terminology: Exploring Bilingual and Monolingual Term Extraction. Licentiate thesis, Linköping University.
    [Google Scholar]
  24. Frantzi, Katerina, Sophia Ananiadou, and Hideki Mima
    2000 “Automatic Recognition of Multi-Word Terms: The C-Value/NC-Value Method.” International Journal on Digital Libraries3 (2): 115–130. 10.1007/s007999900023
    https://doi.org/10.1007/s007999900023 [Google Scholar]
  25. Frantzi, Katerina, Sophia Ananiadou, and Junichi Tsujii
    1998 “The C-Value/NC-Value Method of Automatic Recognition for Multi-Word Terms.” InResearch and Advanced Technology for Digital Libraries: Proceedings of the Second European Conference (ECDL), ed. byChristos Nikolaou, and Constantine Stephanidis, 585–604. Berlin, Heidelberg: Springer. 10.1007/3‑540‑49653‑X_35
    https://doi.org/10.1007/3-540-49653-X_35 [Google Scholar]
  26. Fulford, Heather
    2001 “Exploring Terms and Their Linguistic Environment in Text: A Domain-Independent Approach to Automated Term Extraction.” Terminology7 (2): 259–279. 10.1075/term.7.2.08ful
    https://doi.org/10.1075/term.7.2.08ful [Google Scholar]
  27. Gaussier, Éric
    1998 “Flow Network Models for Word Alignment and Terminology Extraction from Bilingual Corpora.” InProceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (ACL-COLING), 444–450, Montreal, Quebec, Canada.
    [Google Scholar]
  28. Gray, Bethany, Jesse Egbert, and Douglas Biber
    2017 “Exploring Methods for Evaluating Corpus Representativeness.” InProceedings of the 9th International Corpus Linguistics Conference, 563–566, Birmingham, UK.
    [Google Scholar]
  29. Heylen, Kris, and Dirk De Hertog
    2015 “Automatic Term Extraction.” InHandbook of Terminology, vol.1, ed. byHendrik J. Kockaert, and Frieda Steurs, 203–221. Amsterdam: John Benjamins. 10.1075/hot.1.aut1
    https://doi.org/10.1075/hot.1.aut1 [Google Scholar]
  30. Itagaki, Masaki, Takako Aikawa, and Xiaodong He
    2007 “Automatic Validation of Terminology Translation Consistency with Statistical Method.” InProceedings of the Machine Translation Summit XI, 269–274, Copenhagen, Denmark.
    [Google Scholar]
  31. Jacquemin, Christian
    2001Spotting and Discovering Terms through Natural Language Processing. Cambridge: The MIT Press.
    [Google Scholar]
  32. Kageura, Kyo
    2012The Quantitative Analysis of the Dynamics and Structure of Terminologies. Amsterdam: John Benjamins. 10.1075/tlrp.15
    https://doi.org/10.1075/tlrp.15 [Google Scholar]
  33. Kageura, Kyo, and Genichiro Kikui
    2006 “A Self-Referring Quantitative Evaluation of the ATR Basic Travel Expression Corpus (BTEC).” InProceedings of the 5th International Conference on Language Resources and Evaluation (LREC), 1945–1950, Genoa, Italy.
    [Google Scholar]
  34. Kageura, Kyo, and Bin Umino
    1996 “Methods of Automatic Term Recognition: A Review.” Terminology3 (2): 259–289. 10.1075/term.3.2.03kag
    https://doi.org/10.1075/term.3.2.03kag [Google Scholar]
  35. Khmaladze, Estate V.
    1987The Statistical Analysis of Large Numbers of Rare Events. Technical Report MS-R8804, Department of Mathematical Sciences, CWI, Amsterdam.
    [Google Scholar]
  36. Kim, Young-Gil, Seong-Il Yang, Munpyo Hong, Chang-Hyun Kim, Young-Ae Seo, Cheol Ryu, Sang-Kyu Park, and Se-Young Park
    2005 “Terminology Construction Workflow for Korean-English Patent MT.” InProceedings of the Machine Translation Summit X, 55–59, Phuket, Thailand.
    [Google Scholar]
  37. Kudo, Taku, Kaoru Yamamoto, and Yuji Matsumoto
    2004 “Applying Conditional Random Fields to Japanese Morphological Analysis.” InProceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 230–237, Barcelona, Spain.
    [Google Scholar]
  38. Kupiec, Julian
    1993 “An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora.” InProceedings of the 31st Annual Meeting on Association for Computational Linguistics (ACL), 17–22, Columbus, Ohio, USA. 10.3115/981574.981577
    https://doi.org/10.3115/981574.981577 [Google Scholar]
  39. Langlais, Philippe
    2017 “Users and Data: The Two Neglected Children of Bilingual Natural Language Processing Research.” InProceedings of the 10th Workshop on Building and Using Comparable Corpora (BUCC), 1–5, Vancouver, Canada. 10.18653/v1/W17‑2501
    https://doi.org/10.18653/v1/W17-2501 [Google Scholar]
  40. Langlais, Philippe, and Michael Carl
    2004 “General-Purpose Statistical Translation Engine and Domain Specific Texts: Would It Work?” Terminology10 (1): 131–153. 10.1075/term.10.1.07lan
    https://doi.org/10.1075/term.10.1.07lan [Google Scholar]
  41. Leech, Geoffrey
    2007 “New Resources, or Just Better Old Ones? The Holy Grail of Representativeness.” InCorpus Linguistics and the Web, ed. byMarianne Hundt, Nadja Nesselhauf, and Carolin Biewer, 133–149. Amsterdam: Rodopi. 10.1163/9789401203791_009
    https://doi.org/10.1163/9789401203791_009 [Google Scholar]
  42. Miyata, Rei, and Kyo Kageura
    2016 “Constructing and Evaluating Controlled Bilingual Terminologies.” InProceedings of the 5th International Workshop on Computational Terminology (CompuTerm), 83–93. Osaka, Japan.
    [Google Scholar]
  43. Miyata, Rei, Anthony Hartley, Kyo Kageura, and Cécile Paris
    2017 “Evaluating the Usability of a Controlled Language Authoring Assistant.” The Prague Bulletin of Mathematical Linguistics108: 147–158. 10.1515/pralin‑2017‑0016
    https://doi.org/10.1515/pralin-2017-0016 [Google Scholar]
  44. Møller, Margrethe H., and Ellen Christoffersen
    2006 “Buiding a Controlled Language Lexicon for Danish.” LSP & Professional Communication6 (1): 26–37.
    [Google Scholar]
  45. Sager, Juan C.
    1990A Practical Course in Terminology Processing. Amsterdam: John Benjamins. 10.1075/z.44
    https://doi.org/10.1075/z.44 [Google Scholar]
  46. 2001 “Terminology Compilation: Consequences and Aspects of Automation.” InHandbook of Terminology Management, vol.2, ed. bySue Ellen Wright, and Gerhard Budin, 761–771. Amsterdam: John Benjamins. 10.1075/z.htm2.29sag
    https://doi.org/10.1075/z.htm2.29sag [Google Scholar]
  47. Sanseido
    Sanseido 2002Grand Concise Japanese-English Dictionary. Tokyo: Sanseido.
    [Google Scholar]
  48. Sato, Koichi, Koichi Takeuchi, and Kyo Kageura
    2013 “Terminology-driven Augmentation of Bilingual Terminologies.” InProceedings of the Machine Translation Summit XIV, 3–10, Nice, France.
    [Google Scholar]
  49. Sharoff, Serge, and Anthony Hartley
    2012 “Lexicography, Terminology and Ontologies.” InHandbook of Technical Communication, ed. byAlexander Mehler, and Laurent Romary, 317–346. Boston: De Gruyter Mouton. 10.1515/9783110224948.317
    https://doi.org/10.1515/9783110224948.317 [Google Scholar]
  50. Sichel, Herbert S.
    1975 “On a Distribution Law for Word Frequencies.” Journal of the American Statistical Association70 (351a): 542–547. 10.1080/01621459.1975.10482469
    https://doi.org/10.1080/01621459.1975.10482469 [Google Scholar]
  51. Simon, Herbert
    1960 “Some Further Notes on a Class of Skew Distribution Functions.” Information and Control3 (1): 80–88. 10.1016/S0019‑9958(60)90302‑8
    https://doi.org/10.1016/S0019-9958(60)90302-8 [Google Scholar]
  52. TerminOrgs
    TerminOrgs 2012 “Terminology Starter Guide.” www.terminorgs.net/downloads/TerminOrgs_StarterGuide_V1.pdf. Accessed21 September 2018.
  53. Thicke, Lori
    2011 “Improving MT Results: A Study.” MultilingualJanuary/February: 37–40.
    [Google Scholar]
  54. Toutanova, Kristina, Dan Klein, Christopher Manning, and Yoram Singer
    2003 “Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network.” InProceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), 173–180, Edmonton, Canada.
    [Google Scholar]
  55. Tsuji, Keita, and Kyo Kageura
    2004 “Extracting Low-frequency Translation Pairs from Japanese-English Bilingual Corpora.” InProceedings of the 3rd International Workshop on Computational Terminology (CompuTerm), 23–30, Geneva, Switzerland.
    [Google Scholar]
  56. Tuldava, Juhan
    1995Methods in Quantitative Linguistics. Trier: Wissenschaftlicher Verlag Trier.
    [Google Scholar]
  57. Warburton, Kara
    2014 “Developing Lexical Resources for Controlled Authoring Purposes.” InProceedings of LREC 2014 Workshop: Controlled Natural Language Simplifying Language Use, 90–103, Reykjavik, Iceland.
    [Google Scholar]
  58. 2015a “Managing Terminology in Commercial Environments.” InHandbook of Terminology, vol.1, ed. byHendrik J. Kockaert, and Frieda Steurs, 360–392. Amsterdam: John Benjamins. 10.1075/hot.1.19man2
    https://doi.org/10.1075/hot.1.19man2 [Google Scholar]
  59. 2015b “Terminology Management.” InRoutledge Encyclopedia of Translation Technology, ed. bySin-Wai Chan, 644–661. New York: Routledge.
    [Google Scholar]
  60. Wright, Sue Ellen, and Gerhard Budin
    (eds) 2001Handbook of Terminology Management, vol.2. Amsterdam: John Benjamins. 10.1075/z.htm2
    https://doi.org/10.1075/z.htm2 [Google Scholar]
  61. Yoshikane, Fuyuki, Tsuji Keita, Kyo Kageura, and Christian Jacquemin
    2003 “Morpho-Syntactic Rules for Detecting Japanese Term Variation: Establishment and Evaluation.” Journal of Natural Language Processing10 (4): 3–32. 10.5715/jnlp.10.4_3
    https://doi.org/10.5715/jnlp.10.4_3 [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error