Volume 11, Issue 1
  • ISSN 2211-3711
  • E-ISSN: 2211-372X
Buy:$35.00 + Taxes



Recent developments in neural machine translation, and especially speech translation, are gradually but firmly entering the field of audiovisual translation (AVT). Automation in subtitling is extending from a machine translation (MT) component to fully automatic subtitling, which comprises MT, auto-spotting and automatic segmentation. The rise of this new paradigm renders MT-oriented experimental designs inadequate for the evaluation and investigation of automatic subtitling, since they fail to encompass the multimodal nature and technical requirements of subtitling. This paper highlights the methodological gaps to be addressed by multidisciplinary efforts in order to overcome these inadequacies and obtain metrics and methods that lead to rigorous experimental research in automatic subtitling. It presents a review of previous experimental designs in MT for subtitling, identifies their limitations for conducting research under the new paradigm and proposes a set of recommendations towards achieving replicability and reproducibility in experimental research at the crossroads between AVT and MT.


Article metrics loading...

Loading full text...

Full text loading...


  1. Álvarez, Aitor, Marina Balenciaga, Arantza del Pozo, Haritz Arzelus, Anna Matamala, and Carlos-D. Martínez-Hinarejos
    2016 Impact of Automatic Segmentation on the Quality, Productivity and Self-reported Post-editing Effort of Intralingual Subtitles. InProceedings of the Tenth International Conference on Language Re-sources and Evaluation (LREC’16), Portorož, Slovenia, pp.3049–3053. European Language Resources Association (ELRA).
    [Google Scholar]
  2. Alves, Fabio
    2003 Tradução, cognição e contextualidade: Triangulando a interface processo produto no desempenho de tradutores novos [Translation, cognition and contextualisation: Triangulating the process-product interface in the performance of novice translators]. Translation, cognition D.E.L.T.A.191, 71–108.
    [Google Scholar]
  3. Armstrong, Stephen, Colm Caffrey, and Marian Flanagan
    2006 Translating DVD Subtitles from English-German and English-Japanese Using Example-Based Machine Translation. InMuTra 2006–Audiovisual Translation Scenarios: Conference Proceedings.
    [Google Scholar]
  4. American Psychological Association-APA
    American Psychological Association-APA 2010Publication Manual of the American Psychological Association (6th edition ed.). Washington: APA.
    [Google Scholar]
  5. Aziz, Wilker, Sheila C. M. de Sousa, and Lucia Specia
    2012 Cross-lingual Sentence Compression for Subtitles. In16th Annual Conference of the European Association for Machine Translation, EAMT, Trento, Italy, pp.103–110.
    [Google Scholar]
  6. Barrault, Loїc, Fethi Bougares, Lucia Specia, Chiraag Lala, Desmond Elliott, and Stella Frank
    2018 Findings of the Third Shared Task on Multimodal Machine Translation. InProceedings of the Third Conference on Machine Translation: Shared Task Papers, Belgium, Brussels, pp.304–323. Association for Computational Linguistics. 10.18653/v1/W18‑6402
    https://doi.org/10.18653/v1/W18-6402 [Google Scholar]
  7. Belz, Anya, Simon Mille, and David M. Howcroft
    2020 Disentangling the properties of human evaluation methods: A classification system to support comparability, meta-evaluation and reproducibility testing. InProceedings of the 13th International Conference on Natural Language Generation, Dublin, Ireland, pp.183–194. Association for Computational Linguistics.
    [Google Scholar]
  8. Bentivogli, Luisa, Nicola Bertoldi, Mauro Cettolo, Marcello Federico, Matteo Negri, and Marco Turchi
    2016 On the Evaluation of Adaptive Machine Translation for Human Post-Editing. IEEE/ACM Transactions on Audio, Speech, and Language Processing24(2), 388–399. 10.1109/TASLP.2015.2509241
    https://doi.org/10.1109/TASLP.2015.2509241 [Google Scholar]
  9. Bentivogli, Luisa, Mauro Cettolo, Marcello Federico, and Christian Federmann
    2018 Machine translation human evaluation: An investigation of evaluation based on Post-editing and its relation with Direct Assessment. InProceedings of the 15th International Workshop on Spoken Language Translation (IWSLT 2018).
    [Google Scholar]
  10. Bolaños-García-Escribano, Alejandro, Jorge Díaz-Cintas, and Serenella Massidda
    2021 Latest advancements in audiovisual translation education. The Interpreter and Translator Trainer15(1), 1–12. 10.1080/1750399X.2021.1880308
    https://doi.org/10.1080/1750399X.2021.1880308 [Google Scholar]
  11. Burchardt, Aljoscha, Arle Lommel, Lindsay Bywood, Kim Harris, and Maja Popović
    2016 Machine translation quality in an audiovisual context. Target28(2), 206–221. 10.1075/target.28.2.03bur
    https://doi.org/10.1075/target.28.2.03bur [Google Scholar]
  12. Bywood, Lindsay, Panayota Georgakopoulou, and Thierry Etchegoyhen
    2017 Embracing the threat: machine translation as a solution for subtitling. Perspectives25(3), 492–508. 10.1080/0907676X.2017.1291695
    https://doi.org/10.1080/0907676X.2017.1291695 [Google Scholar]
  13. Carroll, Mary and Jan Ivarsson
    1998Code of Good Subtitling Practice. Simrishamn: TransEdit.
    [Google Scholar]
  14. de Sousa, Sheila C. M., Wilker Aziz, and Lucia Specia
    2011 Assessing the Post-Editing Effort for Automatic and Semi-Automatic Translations of DVD Subtitles. InProceedings of the International Conference Recent Advances in Natural Language Processing, pp.97–103. Association for Computational Linguistics.
    [Google Scholar]
  15. Doddington, George
    2002 Automatic Evaluation of Machine Translation Quality Using N-Gram Co-Occurrence Statistics. InProceedings of the Second International Conference on Human Language Technology Research, HLT ’02, San Francisco, CA, USA, pp.138–145. Morgan Kaufmann Publishers Inc.10.3115/1289189.1289273
    https://doi.org/10.3115/1289189.1289273 [Google Scholar]
  16. Doherty, Stephen
    2016 The impact of translation technologies on the process and product of translation. International Journal of Communication101, 947–969.
    [Google Scholar]
  17. Díaz-Cintas, Jorge
    2013 The technology turn in subtitling. InProceedings from the 5th International Maastricht – Lódź Duo Colloquium on Translation and Meaning, Part91, pp.119–132.
    [Google Scholar]
  18. Etchegoyhen, Thierry, Lindsay Bywood, Mark Fishel, Panayota Georgakopoulou, Jie Jiang, Gerard van Loenhout, Arantza del Pozo, Mirjiam S. Maučec, Anja Turner, and Martin Volk
    2014 Machine Translation for Subtitling: A Large-Scale Evaluation. InProceedings of the 9th International Conference on Language Resources and Evaluation (LREC), pp.46–53.
    [Google Scholar]
  19. Facebook
    Facebook 2020 “New Automated Captions Powered by AI”. Last accessed: 30/06/2021. https://about.fb.com/news/2020/09/new-automated-captions-powered-by-ai/
  20. Hadley, James, Maja Popović, Haithem Afli, and Andy Way
    (eds.) 2019Proceedings of the Qualities of Literary Machine Translation, Dublin, Ireland. European Association for Machine Translation.
    [Google Scholar]
  21. Howcroft, David M., Anja Belz, Miruna Clinciu, Dimitra Gkatzia, Sadid A. Hasan, Saad Mahamood, Simon Mille, Emiel van Miltenburg, Sashank Santhanam, and Verena Rieser
    2020 Twenty years of confusion in human evaluation: NLG needs evaluation sheets and standardised definitions. InProceedings of the 13th International Conference on Natural Language Generation, Dublin, Ireland, pp.169–182. Association for Computational Linguistics.
    [Google Scholar]
  22. Hu, Ke, Sharon O’Brien, and Dorothy Kenny
    2019 A reception study of machine translated subtitles for MOOCs. Perspectives, 1–18.
    [Google Scholar]
  23. Jiménez-Crespo, Miguel A.
    2020 The “technological turn” in translation studies: Are we there yet? A transversal cross-disciplinary approach. Translation Spaces9(2), 314–341. 10.1075/ts.19012.jim
    https://doi.org/10.1075/ts.19012.jim [Google Scholar]
  24. Karakanta, Alina, Matteo Negri, and Marco Turchi
    2020a Is 42 the Answer to Everything in Subtitling-oriented Speech Translation?InProceedings of the 17th International Conference on Spoken Language Translation, Online, pp.209–219. Association for Computational Linguistics. 10.18653/v1/2020.iwslt‑1.26
    https://doi.org/10.18653/v1/2020.iwslt-1.26 [Google Scholar]
  25. 2020b MuST-Cinema: a Speech-to-Subtitles corpus. InProceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, pp.3727–3734. European Language Resources Association.
    [Google Scholar]
  26. Karamitroglou, Fotios
    1998 A proposed set of subtitling standards in Europe. Translation Journal.
    [Google Scholar]
  27. Koponen, Maarit, Umut Sulubacak, Kaisa Vitikainen, and Jörg Tiedemann
    2020 MT for subtitling: User evaluation of post-editing productivity. InProceedings of the 22nd Annual Conference of the European Association for Machine Translation (EAMT 2020), pp.115–124.
    [Google Scholar]
  28. Krings, Hans P.
    (2001) Repairing texts: Empirical investigations of machine translation post-editing process. The Kent State University Press.
    [Google Scholar]
  29. Lison, Pierre and Jörg Tiedemann
    2016 OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. InProceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Portorož, Slovenia, pp.923–929. European Language Resources Association (ELRA).
    [Google Scholar]
  30. Lo, Chi-kiu
    2019 YiSi – a Unified Semantic MT Quality Evaluation and Estimation Metric for Languages with Different Levels of Available Resources. InProceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), Florence, Italy, pp.507–513. Association for Computational Linguistics. 10.18653/v1/W19‑5358
    https://doi.org/10.18653/v1/W19-5358 [Google Scholar]
  31. Matusov, Evgeny, Gregor Leusch, Oliver Bender, and Hermann Ney
    2005 Evaluating Machine Translation Output with Automatic Sentence Segmentation. InProceedings of the Second International Workshop on Spoken Language Translation, Pittsburgh, Pennsylvania, USA.
    [Google Scholar]
  32. Matusov, Evgeny, Patrick Wilken, and Yota Georgakopoulou
    2019 Customizing Neural Machine Translation for Subtitling. InProceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers), Florence, Italy, pp.82–93. Association for Computational Linguistics. 10.18653/v1/W19‑5209
    https://doi.org/10.18653/v1/W19-5209 [Google Scholar]
  33. Melero, Maite, Antoni Oliver, and Toni Badia
    (2006) Automatic Multilingual Subtitling in the eTITLE Project. InProceedings of ASLIB Translating and the Computer281.
    [Google Scholar]
  34. Moorkens, Joss and Sharon O’Brien
    (2017) Assessing User Interface Needs of Post-Editors of Machine Translation. Human Issues in Translation Technology: The IATIS Yearbook, 109–130.
    [Google Scholar]
  35. Nikolic, Kristijan and Lindsay Bywood
    2021 Audiovisual Translation: The Road Ahead. Journal of Audiovisual Translation4(1), 50–70. 10.47476/jat.v4i1.2021.156
    https://doi.org/10.47476/jat.v4i1.2021.156 [Google Scholar]
  36. Orero, Pilar, Stephen Doherty, Jan-Louis Kruger, Anna Matamala, Jan Pedersen, Elisa Perego, Pablo Romero-Fresco, Sara Rovira-Esteva, Olga Soler-Vilageliu, and Agnieszka Szarkowska
    2018 Conducting experimental research in audiovisual translation (AVT): A position paper. Journal of Specialised Translation301, 105–126.
    [Google Scholar]
  37. Orrego-Carmona, David, Łukasz Dutka, and Agnieszka Szarkowska
    2018 Using translation process research to explore the creation of subtitles: an eye-tracking study comparing professional and trainee subtitlers. Journal of Specialised Translation0(30), 150–180.
    [Google Scholar]
  38. O’Brien, Sharon
    2021 Translation, human–computer interaction and cognition. InFabio Alves and Arnt Lykke Jakobsen (eds.), The Routledge Handbook of Translation and Cognition, pp.376–388. Routledge.
    [Google Scholar]
  39. O’Hagan, Minako
    2003 Can language technology respond to the subtitler’s dilemma? – a preliminary study. InProceedings of the 25th International Conference on Translation and the Computer.
    [Google Scholar]
  40. Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu
    2002 BLEU: a Method for Automatic Evaluation of Machine Translation. InProceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp.311–318. Association for Computational Linguistics.
    [Google Scholar]
  41. Pedersen, Jan
    2017 The FAR Model: Assessing Quality in Interlingual Subtitling. The Journal of Specialised Translation281, 210–229.
    [Google Scholar]
  42. Piperidis, Stelios, Iason Demiros, and Prokopis Prokopidis
    2005 Infrastructure for a Multilingual Subtitle Generation System. In9th International Symposium on Social Communication, pp.24–28.
    [Google Scholar]
  43. Popović, Maja
    2015 chrF: character n-gram F-score for automatic MT evaluation. InProceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal, pp.392–395. Association for Computational Linguistics. 10.18653/v1/W15‑3049
    https://doi.org/10.18653/v1/W15-3049 [Google Scholar]
  44. Popowich, Fred, Paul McFetridge, Davide Turcato, and Janine Toole
    2000 Machine Translation of Closed Captions. Machine Translation, 311–341. 10.1023/A:1012244918183
    https://doi.org/10.1023/A:1012244918183 [Google Scholar]
  45. Rei, Ricardo, Craig Stewart, Ana C. Farinha, and Alon Lavie
    2020 COMET: A neural framework for MT evaluation. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, pp.2685–2702. Association for Computational Linguistics. 10.18653/v1/2020.emnlp‑main.213
    https://doi.org/10.18653/v1/2020.emnlp-main.213 [Google Scholar]
  46. Snover, Matthew, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul
    2006 A Study of Translation Edit Rate with Targeted Human Annotation. InProceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, Cambridge, Massachusetts, USA, pp.223–231. Association for Machine Translation in the Americas.
    [Google Scholar]
  47. Szarkowska, Agnieszka, Jorge Diaz-Cintas, and Olivia Gerber-Morón
    2020 Quality is in the eye of the stakeholders: what do professional subtitlers and viewers think about subtitling?Universal Access in the Information Society201, 661–675. 10.1007/s10209‑020‑00739‑2
    https://doi.org/10.1007/s10209-020-00739-2 [Google Scholar]
  48. Tadmor-Ramanovich, Michelle and Nadav Bar
    2019 “On-Device Captioning with Live Caption.” Google AIBlog. Last accessed: 30/06/2021. https://ai.googleblog.com/2019/10/on-device-captioning-with-live-caption.html
    [Google Scholar]
  49. Volk, Martin, Rico Sennrich, Christian Hardmeier, and Frida Tidström
    2010 Machine Translation of TV Subtitles for Large Scale Production. InVentsislav Zhechev (ed.), Proceedings of the Second Joint EM+/CNGL Workshop ”Bringing MT to the User: Research on Integrating MT in the Translation Industry (JEC’10), Denver, pp.53–62.
    [Google Scholar]
  50. Wang, Weiyue, Jan-Thorsten Peter, Hendrik Rosendahl, and Hermann Ney
    2016 CharacTer: Translation Edit Rate on Character Level. InProceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, Berlin, Germany, pp.505–510. Association for Computational Linguistics. 10.18653/v1/W16‑2342
    https://doi.org/10.18653/v1/W16-2342 [Google Scholar]
  51. Wang, Xin, Jiawei Wu, Junkun Chen, Lei Li, Yuan-Fang Wang, and William Yang Wang
    2019 VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research. InThe IEEE International Conference on Computer Vision (ICCV). 10.1109/ICCV.2019.00468
    https://doi.org/10.1109/ICCV.2019.00468 [Google Scholar]

Data & Media loading...

  • Article Type: Research Article
Keyword(s): evaluation; Neural Machine Translation; post-editing; speech translation; Subtitling
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error