1887
image of How well do Large Language Models handle Machine Translation?
USD
Buy:$35.00 + Taxes

Abstract

We conducted a comparative reception analysis of Spanish‑to‑Czech translations by OpenAI’s GPT‑3.5, GPT‑4, and DeepSeek‑V3 across two text domains (marketing and literary), two evaluation criteria (naturalness and grammar), and two prompting strategies (simple vs. detailed). Additionally, the consistency of these translations was assessed using the Levenshtein distance metric. A reading task in which 132 Czech native speakers gave ratings of the translations on a 5‑point Likert scale revealed that, contrary to previous findings, simple prompts produced reliably higher‑quality translations than detailed prompts. Next, literary translations were rated lower than marketing translations, and grammar ratings exceeded those for naturalness. Additionally, GPT‑4 outperformed the other two models on the literary translations only. Finally, DeepSeek‑V3 showed greater consistency but lower quality of literary translation, suggesting that increased consistency may be at the expense of creativity. These findings provide empirical insights into how prompting strategy, text type, and model choice may influence machine translation quality.

Loading

Article metrics loading...

/content/journals/10.1075/ts.25064.gut
2026-05-05
2026-05-11
Loading full text...

Full text loading...

References

  1. Briva-Iglesias, Vicent
    2024 Fostering human-centered, augmented machine translation: Analysing interactive post-editing. PhD Thesis. Dublin City University. https://doras.dcu.ie/30182/
    [Google Scholar]
  2. 2025 “Are AI agents the new machine translation frontier? Challenges and opportunities of single- and multi-agent systems for multilingual digital communication.” InProceedings of Machine Translation Summit XX: Volume 1. MTSummit 2025, Geneva, Switzerland, edited byP. Bouillon , –. European Association for Machine Translation. https://aclanthology.org/2025.mtsummit-1.28/
    [Google Scholar]
  3. Briva-Iglesias, Vicent, Gokhan Dogru, and João Lucas Cavalheiro Camargo
    2024 “Large Language Models ‘ad referendum’: how good are they at machine translation in the legal domain?” MonTI: –. 10.6035/MonTI.2024.16.02
    https://doi.org/10.6035/MonTI.2024.16.02 [Google Scholar]
  4. Calvo-Ferrer, José Ramón
    2023 “Can you tell the difference? A study of human vs machine-translated subtitles.” Perspectives. Studies in Translation Theory and Practice (): –. 10.1080/0907676X.2023.2268149
    https://doi.org/10.1080/0907676X.2023.2268149 [Google Scholar]
  5. Castilho, Sheila
    2016 Measuring acceptability of machine translated enterprise content. PhD thesis, Dublin City University. doras.dcu.ie/21342/
    [Google Scholar]
  6. Castilho, Sheila, Stephen Doherty, Federico Gaspari, and Joss Moorkens
    2018 “Approaches to Human and Machine Translation Quality Assessment.” InTranslation Quality Assessment. From Principles to Practice, edited byJoss Moorkens, Sheila Castilho, Federico Gaspari, and Stephen Doherty, –. Springer. 10.1007/978‑3‑319‑91241‑7_2
    https://doi.org/10.1007/978-3-319-91241-7_2 [Google Scholar]
  7. Castilho, Sheila, and Sharon O’Brien
    2016 “Evaluating the impact of light post-editing on usability.” InProceedings of the tenth international conference on language resources and evaluation. Portorož, 23–28 May, edited byNicoletta Calzolari , –. European Language Resources Association. www.lrec-conf.org/proceedings/lrec2016/pdf/539_Paper.pdf
    [Google Scholar]
  8. Chatterji, Aaron, Thomas Cunningham, David J. Deming, Zoe Hitzig, Christopher Ong, Carl Yan Shan, and Kevin Wadman
    2025 “How people use ChatGPT.” (NBER Working Paper No. w34255). National Bureau of Economic Research. 10.3386/w34255
    https://doi.org/10.3386/w34255 [Google Scholar]
  9. Christensen, Rune Haubo Bojesen
    2023ordinal-Regression Models for Ordinal Data. R package version 2023.12-4.1. AccessedJune 27, 2025. https://CRAN.R-project.org/package=ordinal
    [Google Scholar]
  10. Colman, Toon, Margot Fonteyne, Joke Daems, Nicolas Dirix, and Lieve Macken
    2022 “GECO-MT: The Ghent Eye-tracking Corpus of Machine Translation.” In13th Conference on Language Resources and Evaluation (LREC 2022), edited byNicoletta Calzolari , –. https://aclanthology.org/2022.lrec-1.4.pdf
    [Google Scholar]
  11. Dinno, Alexis
    2024dunn.test: Dunn’s Test of Multiple Comparisons Using Rank Sums. R package version 1.3.6. AccessedJune 27, 2025. https://CRAN.R-project.org/package=dunn.test
    [Google Scholar]
  12. Fonteyne, Margot, Arda Tezcan, and Lieve Macken
    2020 “Literary machine translation under the magnifying glass: assessing the quality of an NMT-translated detective novel on document level.” In12th International Conference on Language Resources and Evaluation Conference (LREC2020), edited byNicoletta Calzolari , –. European Language Resources Association. https://aclanthology.org/2020.lrec-1.468/
    [Google Scholar]
  13. Fuji, Masaru
    2001 “Evaluation method for determining groups of users who find MT useful.” InProceedings of MT Summit VIII. Santiago de Compostela, –. International Association for Machine Translation. https://aclanthology.org/2001.mtsummit-papers.20.pdf
    [Google Scholar]
  14. Gao, Ruiyao, Lin Yumeng, Zhao Nan, and G. Cai Zhenguang
    2024 “Machine Translation of Chinese Classical Poetry: A Comparison among ChatGPT, Google Translate, and DeepL Translator.” Humanit Soc Sci Commun (): –. 10.1057/s41599‑024‑03363‑0
    https://doi.org/10.1057/s41599-024-03363-0 [Google Scholar]
  15. Gao, Yuan, Ruili Wang, and Feng Hou
    2024 “How to Design Translation Prompts for ChatGPT: An Empirical Study.” InMMASIA ’24 Workshops: Proceedings of the 6th ACM International Conference on Multimedia in Asia Workshops (Article No. 23): –. Association for Computing Machinery. 10.1145/3700410.3702123
    https://doi.org/10.1145/3700410.3702123 [Google Scholar]
  16. Guerberof-Arenas, Ana, and Antonio Toral
    2020 “The Impact of Post-Editing and Machine Translation on Creativity and Reading Experience.” Translation Spaces (): –. 10.1075/ts.20035.gue
    https://doi.org/10.1075/ts.20035.gue [Google Scholar]
  17. 2024 “To be or not to be. A translation reception study of a literary text translated into Dutch and Catalan using machine translation.” Target (): –. 10.1075/target.22134.gue
    https://doi.org/10.1075/target.22134.gue [Google Scholar]
  18. Gunawan, Laurance Sebastian, Brandon Feivel, Hafizh Ash Shiddiqi, Sonya Rapinta Manalu
    2025 “Evaluating the Accuracy and Naturalness of AI-Generated Translations Using BLEU and METEOR: A Comparison of ChatGPT, Gemini, Copilot, and DeepSeek.” Procedia Computer Science: –. 10.1016/j.procs.2025.09.024
    https://doi.org/10.1016/j.procs.2025.09.024 [Google Scholar]
  19. Gutiérrez Rubio, Enrique
    (in press). “ChatGPT for Machine Translation vs. IA-based commercial translation systems Czech to Spanish.” Philologica Canariensia.
    [Google Scholar]
  20. Hadley, James Luke, Kristiina Taivalkoski-Shilov, Carlos S. C. Teixeira, and Antonio Toral
    2022 “Introduction.” InUsing Technologies for Creative-Text Translation, edited byJames Luke Hadley Kristiina Taivalkoski-Shilov, Carlos S. C. Teixeira, and Antonio Toral, –. Routledge. 10.4324/9781003094159‑1
    https://doi.org/10.4324/9781003094159-1 [Google Scholar]
  21. Hassan, Hany, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann
    2018 “Achieving Human Parity on Automatic Chinese to English News Translation.” arXiv:1803.05567. https://arxiv.org/abs/1803.05567
  22. Hendy, Amr, Mohamed Abdelrehim, Amr Sharaf, Vikas Raunak, Mohamed Gabr, Hitokazu Matsushita
    2023 “How good are GPT models at machine translation? A comprehensive evaluation.” (arXiv:2302.09210). arXiv. 10.48550/arXiv.2302.09210
    https://doi.org/10.48550/arXiv.2302.09210 [Google Scholar]
  23. House, Juliane
    2001 “How do we know when a translation is good?” InExploring Translation and Multilingual Text Production: Beyond Content, edited byErich Steiner and Colin Yallop, –. De Gruyter. 10.1515/9783110866193.127
    https://doi.org/10.1515/9783110866193.127 [Google Scholar]
  24. Hu, Ke, Sharon O’Brien, and Dorothy Kenny
    2020 “A reception study of machine translated subtitles for MOOCs.” Perspectives. Studies in Translation Theory and Practice: –. 10.1080/0907676X.2019.1595069
    https://doi.org/10.1080/0907676X.2019.1595069 [Google Scholar]
  25. Jiang, Lili, Yunxiao Jiang, and Lili Han
    2024 “The potential of ChatGPT in translation evaluation: A case study of the Chinese-Portuguese machine translation.” Cadernos de Tradução (): –. https://periodicos.ufsc.br/index.php/traducao/article/view/98613/57637. 10.5007/2175‑7968.2024.e98613
    https://doi.org/10.5007/2175-7968.2024.e98613 [Google Scholar]
  26. Jiao, Wenxiang, Wenxuan Wang, Jen-tse Huang, Xing Wang, Shuming Shi, and Zhaopeng Tu
    2023 “Is ChatGPT a good translator? Yes with GPT-4 as the engine.” (arXiv:2301.08745). arXiv. 10.48550/arXiv.2301.08745
    https://doi.org/10.48550/arXiv.2301.08745 [Google Scholar]
  27. Karpinska, Marzena and Mohit Iyyer
    2023 “Large Language Models Effectively Leverage Document-level Context for Literary Translation, but Critical Errors Persist.” InProceedings of the Eighth Conference on Machine Translation, Singapore. Association for Computational Linguistics, edited byPhilipp Koehn, Barry Haddow, Tom Kocmi, and Christof Monz, –. https://aclanthology.org/2023.wmt-1.41/. 10.18653/v1/2023.wmt‑1.41
    https://doi.org/10.18653/v1/2023.wmt-1.41 [Google Scholar]
  28. Klerke, Sigrid, Sheila Castilho, Maria Barrett, and Anders Søgaard
    2015 “Reading metrics for estimating task efficiency with SMT output.” InProceedings of the Sixth Workshop on Cognitive Aspects of Computational Language Learning, –. Association for Computational Linguistics. 10.18653/v1/W15‑2402
    https://doi.org/10.18653/v1/W15-2402 [Google Scholar]
  29. Larmarange, Joseph
    2024 ggstats: Extension to ‘ggplot2’ for Plotting Stats. R package version 0.7.0. https://CRAN.R-project.org/package=ggstats
  30. Leiter, Christoph and Steffen Eger
    2025 “PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation.” InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, –. https://aclanthology.org/2024.emnlp-main.641.pdf
    [Google Scholar]
  31. Levenshtein, Vladimir Iosifovich
    1965 “Binary codes capable of correcting spurious insertions and deletions of ones.” Problems of Information Transmissions (): –.
    [Google Scholar]
  32. Lüdecke, Daniel
    2018 “ggeffects: Tidy Data Frames of Marginal Effects from Regression Models.” Journal of Open Source Software (): . 10.21105/joss.00772
    https://doi.org/10.21105/joss.00772 [Google Scholar]
  33. Lüdecke, Daniel, Mattan S. Ben-Shachar, Indrajeet Patil, Philip Waggoner, and Dominique Makowski
    2021 performance: An R Package for Assessment, Comparison and Testing of Statistical Models. Journal of Open Source Software(): . 10.21105/joss.03139
    https://doi.org/10.21105/joss.03139 [Google Scholar]
  34. Martínez Melis, Nicole, and Amparo Hurtado Albir
    2001 “Assessment in Translation Studies: Research Needs.” Meta (): –. 10.7202/003624ar
    https://doi.org/10.7202/003624ar [Google Scholar]
  35. Martínez Pisón, Ignacio
    2023Castillos de fuego. Barcelona: Seix Barral.
    [Google Scholar]
  36. Navarro, Gonzalo
    2001 “A guided tour to approximate string matching.” ACM Computing Surveys (CSUR) (): –. 10.1145/375360.375365
    https://doi.org/10.1145/375360.375365 [Google Scholar]
  37. Qiu, Juerong, and Anthony Pym
    2025 “Fatal flaws? Investigating the effects of machine translation errors on audience reception in the audiovisual context.” Perspectives(): –. 10.1080/0907676X.2024.2328757
    https://doi.org/10.1080/0907676X.2024.2328757 [Google Scholar]
  38. R Core Team
    R Core Team 2025R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. AccessedJune 27, 2025. https://www.R-project.org/
    [Google Scholar]
  39. Sahari, Yousef, Abdu M. Talib Al-Kadi, and Jamal Kaid Mohammed Ali
    2023 “Cross Sectional Study of ChatGPT in Translation: Magnitude of Use, Attitudes, and Uncertainties.” Journal of Psycholinguistic Research: –. 10.1007/s10936‑023‑10031‑y
    https://doi.org/10.1007/s10936-023-10031-y [Google Scholar]
  40. Specia, Lucia, and Kashif Shah
    2018 “Machine Translation Quality Estimation: Applications and Future Perspectives”. InTranslation Quality Assessment. From Principles to Practice, edited byJoss Moorkens, Sheila Castilho, Federico Gaspari, and Stephen Doherty, –. Springer. 10.1007/978‑3‑319‑91241‑7_10
    https://doi.org/10.1007/978-3-319-91241-7_10 [Google Scholar]
  41. Tomita, Masaru, Masako Shirai, Junya Tsutsumi, Miki Matsumura, and Yuki Yoshikawa
    1993 “Evaluation of MT Systems by TOEFL.” InProceedings of the 5th International Conference on Theoretical and Methodological Issues in Machine Translation, –. https://aclanthology.org/1993.tmi-1.22.pdf
    [Google Scholar]
  42. Wang, Longyue, Zhaopeng Tu, Yan Gu, Siyou Liu, Dian Yu, Qingsong Ma
    2023 “Findings of the WMT 2023 Shared Task on Discourse-Level Literary Translation: A Fresh Orb in the Cosmos of LLMs.” InProceedings of the Eighth Conference on Machine Translation, Singapore. Association for Computational Linguistics, –. https://arxiv.org/pdf/2311.03127. 10.18653/v1/2023.wmt‑1.3
    https://doi.org/10.18653/v1/2023.wmt-1.3 [Google Scholar]
  43. Way, Andy
    2018 “Quality Expectations of Machine Translation.” InTranslation Quality Assessment. From Principles to Practice, edited byJoss Moorkens, Sheila Castilho, Federico Gaspari, and Stephen Doherty, –. Springer. 10.1007/978‑3‑319‑91241‑7_8
    https://doi.org/10.1007/978-3-319-91241-7_8 [Google Scholar]
  44. Wickham, Hadley
    2019 “Welcome to the tidyverse.” Journal of Open Source Software (): . 10.21105/joss.01686
    https://doi.org/10.21105/joss.01686 [Google Scholar]
  45. Wu, Minghao, Jiahao Xu, Yulin Yuan, Gholamreza Haffari, Longyue Wang, Weihua Luo, and Kaifu Zhang
    2025 “(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts.” Transactions of the Association for Computational Linguistics: –. 10.1162/TACL.a.25
    https://doi.org/10.1162/TACL.a.25 [Google Scholar]
  46. Yamada, Masaru
    2023 “Optimizing machine translation through prompt engineering: an investigation into ChatGPT’s customizability.” arXiv:2308.01391 2025 https://arxiv.abs/pdf/2308.01391
  47. Yan, Jianhao, Pingchuan Yan, Yulong Chen, Judy Li, Xianchao Zhu, and Yue Zhang
    2024 “GPT-4 vs. Human Translators: A Comprehensive Evaluation of Translation Quality Across Languages, Domains, and Expertise Levels.” https://arxiv.org/abs/2407.03658v1. 10.48550/arXiv.2407.03658
    https://doi.org/10.48550/arXiv.2407.03658
  48. Yang, Jinlong, Hengchao Shang, Daimeng Wei, Jiaxin Guo, Zongyao Li, Zhanglin Wu
    2024 “Exploring the traditional NMT model and Large Language Model for chat translation.” InProceedings of the Ninth Conference on Machine Translation, Miami, Florida, USA. Association for Computational Linguistics, –. https://aclanthology.org/2024.wmt-1.105.pdf. 10.18653/v1/2024.wmt‑1.105
    https://doi.org/10.18653/v1/2024.wmt-1.105 [Google Scholar]
  49. You, Mu, Derek F. Wong, Jing Zhang, and Kaixin Lan
    2025 “How well can state-of-the-art machine translation systems render a 16th-century Chinese novel?” Cadernos de tradução (): –. 10.5007/2175‑7968.2025.e108394
    https://doi.org/10.5007/2175-7968.2025.e108394 [Google Scholar]
  50. Zhang, Ran, Wei Zhao, and Steffen Eger
    2025 How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), –. https://aclanthology.org/2025.naacl-long.548.pdf. 10.18653/v1/2025.naacl‑long.548
    https://doi.org/10.18653/v1/2025.naacl-long.548 [Google Scholar]
  51. Zhang, Ran, Wei Zhao, Lieve Macken, and Steffen Eger
    2025 “LITRANSPROQA: An LLM-based LITerary TRANSlation Evaluation Metric with PROfessional Question Answering.” InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, –. https://aclanthology.org/2025.emnlp-main.1482.pdf. 10.18653/v1/2025.emnlp‑main.1482
    https://doi.org/10.18653/v1/2025.emnlp-main.1482 [Google Scholar]
/content/journals/10.1075/ts.25064.gut
Loading
/content/journals/10.1075/ts.25064.gut
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error