Volume 34, Issue 2
  • ISSN 0924-1884
  • E-ISSN: 1569-9986



There is still much to learn about the ways in which human and machine translation differ with regard to the contexts that regulate the production and interpretation of discourse. The present study explores whether a corpus-driven lexical analysis of human and machine translation can unveil discourse features that set the two apart. A balanced corpus of source texts aligned with authentic, professional translations and neural machine translations was compiled for the study. Lexical discrepancies in the two translation corpora were then extracted via a corpus-driven keyword analysis, and examined qualitatively through parallel concordances of source texts aligned with human and machine translation. The study shows that keyword analysis not only reiterates known problems of discourse in machine translation such as lexical inconsistency and pronoun resolution, but can also provide valuable insights regarding contextual aspects of translated discourse deserving further research.

Available under the CC BY 4.0 license.

Article metrics loading...

Loading full text...

Full text loading...



  1. Bawden, Rachel
    2016 “Cross-lingual Pronoun Prediction with Linguistically Informed Features.” InProceedings of the First Conference on Machine Translation, Berlin, Germany, 11–12 August, 564–570. Stroudsburg: Association for Computational Linguistics. 10.18653/v1/W16‑2348
    https://doi.org/10.18653/v1/W16-2348 [Google Scholar]
  2. Blum-Kulka, Shoshana
    1986 “Shifts of Cohesion and Coherence in Translation.” InInterlingual and Intercultural Communication: Discourse and Cognition in Translation and Second Language Acquisition Studies, edited byJuliane House and Shoshana Blum-Kulka, 17–35. Tübingen: Gunter Narr.
    [Google Scholar]
  3. Carpuat, Marine, and Michel Simard
    2012 “The Trouble with SMT Consistency.” InProceedings of the Seventh Workshop on Statistical Machine Translation, Montréal, Canada, 7–8 June, edited byChris Callison-Burch, Philipp Koehn, Christof Monz, Matt Post, Radu Soricut, and Lucia Specia, 442–449. Stroudsburg: Association for Computational Linguistics.  10.5555/2393015.2393077
    https://doi.org/10.5555/2393015.2393077 [Google Scholar]
  4. Catford, John C.
    1965A Linguistic Theory of Translation: An Essay in Applied Linguistics. Oxford: Oxford University Press.
    [Google Scholar]
  5. compara
    compara 2010 (Version 13.1.17.) AccessedApril 12, 2019. www.linguateca.pt/COMPARA/index.php
  6. De Beaugrande, Robert, and Wolfgang Dressler
    1981Introduction to Text Linguistics. London: Longman. 10.4324/9781315835839
    https://doi.org/10.4324/9781315835839 [Google Scholar]
  7. Dougal, Duane K., and Deryle Lonsdale
    2020 “Improving NMT Quality Using Terminology Injection.” InProceedings of the Twelfth International Conference on Language Resources and Evaluation, Marseille, France, 11–16 May, edited byNicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, 4820–4827. Paris: European Language Resources Association. https://www.aclweb.org/anthology/2020.lrec-1.593.pdf
    [Google Scholar]
  8. Frankenberg-Garcia, Ana
    2008 “‘Suggesting Rather Special Facts’: A Corpus-Based Study of Distinctive Lexical Distributions in Translated Texts.” Corpora (3) 2: 195–211. 10.3366/E1749503208000154
    https://doi.org/10.3366/E1749503208000154 [Google Scholar]
  9. 2009 “Are Translations Longer than Source Texts? A Corpus-Based Study of Explicitation.” InCorpus Use and Translating: Corpus Use for Learning to Translate and Learning Corpus Use to Translate: An Introduction, edited byAllison Beeby, Patricia Rodríguez Inés, and Pilar Sánchez-Gijón, 47–58. Amsterdam: John Benjamins. 10.1075/btl.82.05fra
    https://doi.org/10.1075/btl.82.05fra [Google Scholar]
  10. 2016 “A Corpus Study of Loans in Translated and Non-Translated Texts.” InCorpus-Based Approaches to Translation and Interpreting: From Theory to Applications, edited byGloria Corpas Pastor and Miriam Seghiri, 19–42. Frankfurt: Peter Lang.
    [Google Scholar]
  11. Frankenberg-Garcia, Ana, and Diana Santos
    2003 “Introducing compara: The Portuguese–English Parallel Corpus.” InCorpora in Translator Education, edited byFederico Zanettin, Silvia Bernardini, and Dominic Stewart, 71–87. Manchester: St. Jerome.
    [Google Scholar]
  12. Google Translator Toolkit
    Google Translator Toolkit (2019) AccessedDecember 1, 2019. https://translate.google.com/toolkit
  13. Guillou, Liane
    2013 “Analysing Lexical Consistency in Translation.” InProceedings of the Workshop on Discourse in Machine Translation, Soa, Bulgaria, 9 August, edited byBonnie Webber, Andrei Popescu-Belis, Katja Markert, and Jörg Tiedemann, 10–18. Stroudsburg: Association for Computational Linguistics. https://www.aclweb.org/anthology/W13-3302.pdf
    [Google Scholar]
  14. 2016Incorporating Pronoun Function into Statistical Machine Translation. PhD diss.University of Edinburgh.
    [Google Scholar]
  15. Guillou, Liane, Christian Hardmeier, Ekaterina Lapshinova-Koltunski, and Sharid Loáiciga
    2018 “A Pronoun Test Suite Evaluation of the English–German MT Systems at WMT 2018.” InProceedings of the Third Conference on Machine Translation: Shared Task Papers, Brussels, Belgium, 31 October – 1 November, edited byOndřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, and Karin Verspoor, 570–577. Stroudsburg: Association for Computational Linguistics. 10.18653/v1/W18‑6435
    https://doi.org/10.18653/v1/W18-6435 [Google Scholar]
  16. Halliday, M. A. K.
    1978Language as a Social Semiotic: The Social Interpretation of Language and Meaning. London: Edward Arnold.
    [Google Scholar]
  17. Hardmeier, Christian
    2014Discourse in Statistical Machine Translation. PhD diss.Uppsala University.
    [Google Scholar]
  18. House, Juliane
    2006 “Text and Context in Translation.” Journal of Pragmatics38 (3): 338–358. 10.1016/j.pragma.2005.06.021
    https://doi.org/10.1016/j.pragma.2005.06.021 [Google Scholar]
  19. Kilgarriff, Adam
    2009 “Simple Maths for Keywords.” InProceedings of Corpus Linguistics Conference, Liverpool, UK. ucrel.lancs.ac.uk/publications/cl2009/
    [Google Scholar]
  20. Kilgarriff, Adam, Vit Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vit Suchomel
    2014 “The Sketch Engine: Ten Years On.” Lexicography1: 7–36. 10.1007/s40607‑014‑0009‑9
    https://doi.org/10.1007/s40607-014-0009-9 [Google Scholar]
  21. Klaudy, Kinga
    2009 “The Asymmetry Hypothesis in Translation Research.” InTranslators and Their Readers: In Homage to Eugene A. Nida, edited byRodica Dimitriu and Miriam Shlesinger, 283–303. Brussels: Les Editions du Hazard.
    [Google Scholar]
  22. 2017 “Linguistic and Cultural Asymmetry in Translation from and into Minor Languages.” Cadernos de Literatura em Tradução, 17, 22–37. 10.11606/issn.2359‑5388.v0i17p22‑37
    https://doi.org/10.11606/issn.2359-5388.v0i17p22-37 [Google Scholar]
  23. Koehn, Philipp
    2005 “Europarl: A Parallel Corpus for Statistical Machine Translation.” InProceedings of the Tenth Machine Translation Summit, Phuket, Thailand, 12–16 September, 79–86. Tokyo: Asia-Pacific Association for Machine Translation. https://homepages.inf.ed.ac.uk/pkoehn/publications/europarl-mtsummit05.pdf
    [Google Scholar]
  24. Koehn, Philipp, and Josh Schroeder
    2007 “Experiments in Domain Adaptation for Statistical Machine Translation.” InProceedings of the Second Workshop on Statistical Machine Translation, Prague, Czech Republic, 23 June, 224–227. Stroudsburg: Association for Computational Linguistics. 10.3115/1626355.1626388
    https://doi.org/10.3115/1626355.1626388 [Google Scholar]
  25. Lapshinova-Koltunski, Ekaterina, and Christian Hardmeier
    2017 “Discovery of Discourse-Related Language Contrasts through Alignment Discrepancies in English–German Translation.” InProceedings of the Third Workshop on Discourse and Machine Translation, Copenhagen, Denmark, 8 September, edited byBonnie Webber, Andrei Popescu-Belis, and Jörg Tiedemann, 73–81. 10.18653/v1/W17‑4810
    https://doi.org/10.18653/v1/W17-4810 [Google Scholar]
  26. Läubli, Samuel, Rico Sennrich, and Martin Volk
    2018 “Has Machine Translation Achieved Human Parity? A Case for Document-Level Evaluation.” InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October – 4 November, edited byEllen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii, 4791–4796. Stroudsburg: Association for Computational Linguistics. 10.18653/v1/D18‑1512
    https://doi.org/10.18653/v1/D18-1512 [Google Scholar]
  27. Luong, Ngoc-Quang, and Andrei Popescu-Belis
    2016 “A Contextual Language Model to Improve Machine Translation of Pronouns by Re-ranking Translation Hypotheses.” InProceedings of the 19th Annual Conference of the European Association for Machine Translation, Riga, Latvia, special issue ofBaltic Journal of Modern Computing4 (2): 292–304.
    [Google Scholar]
  28. Luong, Ngoc-Quang, Andrei Popescu-Belis, Annette Rios Gonzales, and Don Tuggener
    2017 “Machine Translation of Spanish Personal and Possessive Pronouns Using Anaphora Probabilities.” InProceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Vol 2, Short Papers, Valencia, Spain, 3–7 April, edited byMirella Lapata, Phil Blunsom, and Alexander Koller, 631–636. Stroudsburg: Association for Computational Linguistics. 10.18653/v1/E17‑2100
    https://doi.org/10.18653/v1/E17-2100 [Google Scholar]
  29. Morante, Roser, and Caroline Sporleder
    2012 “Modality and Negation: An Introduction to the Special Issue.” Computational Linguistics, 38 (2): 223–260. 10.1162/COLI_a_00095
    https://doi.org/10.1162/COLI_a_00095 [Google Scholar]
  30. Nakov, Preslav
    2016 “Negation and Modality in Machine Translation.” InProceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics, Osaka, Japan, 12 December, edited byEduardo Blanco, Roser Morante, and Roser Saurí, 41. Stroudsburg: Association for Computational Linguistics. https://www.aclweb.org/anthology/W16-5005.pdf
    [Google Scholar]
  31. Popescu-Belis, Andrei, Sharid Loáiciga, Christian Hardmeier, and Deyi Xiong
    eds. 2019Proceedings of the Fourth Workshop on Discourse in Machine Translation, Hong Kong, China, 3 November. Stroudsburg: Association for Computational Linguistics. https://www.aclweb.org/anthology/volumes/D19-65/
    [Google Scholar]
  32. Pym, Anthony
    2015 “Translating as Risk Management.” Journal of Pragmatics85: 67–80. 10.1016/j.pragma.2015.06.010
    https://doi.org/10.1016/j.pragma.2015.06.010 [Google Scholar]
  33. Schleiermacher, Friedrich
    (1813) 2004 “On the Different Methods of Translating.” InThe Translation Studies Reader, 2nd ed., edited byLawrence Venuti, 43–63. London: Routledge.
    [Google Scholar]
  34. Tiedemann, Jörg
    2012 “Parallel Data, Tools and Interfaces in OPUS.” InProceedings of the 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey, edited byNicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, 2214–2218. Stroudsburg: Association for Computational Linguistics. www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf
    [Google Scholar]
  35. Tirkkonen-Condit, Sonja
    1990 “Professional vs. Non-Professional Translation: A Think-Aloud Protocol Study.” InLearning, Keeping and Using Language: Selected Papers from the Eighth World Congress of Applied Linguistics, Sydney, 16–21 August 1987, edited byM. A. K. Halliday, John Gibbons, and Howard Nicholas, 381–394. Amsterdam: John Benjamins. 10.1075/z.lkul2.28tir
    https://doi.org/10.1075/z.lkul2.28tir [Google Scholar]
  36. Tognini-Bonelli, Elena
    2001Corpus Linguistics at Work. Amsterdam: John Benjamins. 10.1075/scl.6
    https://doi.org/10.1075/scl.6 [Google Scholar]
  37. Toral, Antonio, and Andy Way
    2018 “What Level of Quality Can Neural Machine Translation Attain on Literary Text?” InTranslation Quality Assessment: From Principles to Practice, vol.1, edited byJoss Moorkens, Sheila Castilho, Federico Gaspari, and Stephen Doherty, 263–287. Cham: Springer. 10.1007/978‑3‑319‑91241‑7_12
    https://doi.org/10.1007/978-3-319-91241-7_12 [Google Scholar]
  38. Turovsky, Barak
    2016 “Found in Translation: More Accurate, Fluent Sentences in Google Translate.” Google (blog), November15 2016 https://blog.google/products/translate/found-translation-more-accurate-fluent-sentences-google-translate/
    [Google Scholar]
  39. Van Dijk, Teun A.
    1977Text and Context: Explorations in the Semantics and Pragmatics of Discourse. Harlow: Longman.
    [Google Scholar]
  40. Vinay, Jean-Paul, and Jean Darbelnet
    (1958) 2004 “A Methodology for Translation.” InThe Translation Studies Reader, 2nd ed., edited byLawrence Venuti, 128–137. London: Routledge.
    [Google Scholar]
  41. Webber, Bonnie, Andrei Popescu-Belis, and Jörg Tiedemann
    eds. 2017Proceedings of the Third Workshop on Discourse in Machine Translation, Copenhagen, Denmark, 8 September. https://www.aclweb.org/anthology/W17-4800
    [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error