Volume 6, Issue 2
  • ISSN 2211-3711
  • E-ISSN: 2211-372X
Buy:$35.00 + Taxes


The last few years have witnessed a surge in the interest of a new machine translation paradigm: neural machine translation (NMT). Neural machine translation is starting to displace its corpus-based predecessor, statistical machine translation (SMT). In this paper, I introduce NMT, and explain in detail, without the mathematical complexity, how neural machine translation systems work, how they are trained, and their main differences with SMT systems. The paper will try to decipher NMT jargon such as “distributed representations”, “deep learning”, “word embeddings”, “vectors”, “layers”, “weights”, “encoder”, “decoder”, and “attention”, and build upon these concepts, so that individual translators and professionals working for the translation industry as well as students and academics in translation studies can make sense of this new technology and know what to expect from it. Aspects such as how NMT output differs from SMT, and the hardware and software requirements of NMT, both at training time and at run time, on the translation industry, will be discussed.


Article metrics loading...

Loading full text...

Full text loading...


  1. Arthur, Philip , Graham Neubig , and Satoshi Nakamura
    2016 “Incorporating Discrete Translation Lexicons into Neural Machine Translation.” inProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (Austin, Texas, November 1–5, 2016). 1557–1567. doi: 10.18653/v1/D16‑1162
    https://doi.org/10.18653/v1/D16-1162 [Google Scholar]
  2. Bahdanau, Dzmitry , Kyunghyun Cho , and Yoshua Bengio
    2014 “Neural Machine Translation by Jointly Learning to Align and Translate”, eprint arXiv:1409.0473 (https://arXiv.org/abs/1409.0473).
    [Google Scholar]
  3. Bentivogli, Luisa , Arianna Bisazza , Mauro Cettolo , and Marcello Federico
    2016 “Neural versus Phrase-Based Machine Translation Quality: A Case Study.” inProceedings of Conference on Empirical Methods in Natural Language Processing. EMNLP: Texas (USA). 257–267. (eprint arXiv:1608.04631arxiv.org/abs/1608.04631). doi: 10.18653/v1/D16‑1025
    https://doi.org/10.18653/v1/D16-1025 [Google Scholar]
  4. Bojar, Ondrej , Rajen Chatterjee , Christian Federmann , Yvette Graham , Barry Haddow , Matthias Huck , Antonio Jimeno Yepes , Philipp Koehn , Varvara Logacheva , Christof Monz , Matteo Negri , Aurélie Névéol , Mariana Neves , Martin Popel , Matt Post , Raphael Rubino , Carolina Scarton , Lucia Specia , Marco Turchi , Karin Verspoor , and Marcos Zampieri
    2016 “Findings of the 2016 Conference on Machine Translation.” inProceedings of the First Conference on Machine Translation (Berlin, Germany, August). 131–198.
    [Google Scholar]
  5. Castilho, Sheila , Joss Moorkens , Federico Gaspari , Iacer Calixto , John Tinsley , and Andy Way
    2017 “Is Neural Machine Translation the New State of the Art?” Prague Bulletin of Mathematical Linguistics108(1): 109–120. doi: 10.1515/pralin‑2017‑0013
    https://doi.org/10.1515/pralin-2017-0013 [Google Scholar]
  6. Cho, Kyunghyun , Bart Van Merriënboer , Dzmitry Bahdanau , and Yoshua Bengio
    2014 “On the Properties of Neural Machine Translation: Encoder-Decoder Approaches.” eprint arXiv:1409.1259 (https://arxiv.org/abs/1409.1259).
  7. Forcada, Mikel L. , and Ramón P. Ñeco
    1997 “Recursive hetero-associative memories for translation” inBiological and Artificial Computation: From Neuroscience to Technology (International Work-Conference on Artificial and Natural Neural Networks, IWANN’97 Lanzarote, Canary Islands, Spain, June 4–6, 1997, Proceedings), edited by José Mira , Roberto Moreno-Díaz , and Joan Cabestany . Heidelberg: Springer. 453–462.
    [Google Scholar]
  8. Forcada, Mikel L.
    2010 “Machine Translation Today”, inHandbook of Translation Studies, edited by Yves Gambier , Luc Van Doorslaer. vol.1, 215–223. doi: 10.1075/hts.1.mac1
    https://doi.org/10.1075/hts.1.mac1 [Google Scholar]
  9. Foster, George , Pierre Isabelle , and Pierre Plamondon
    1997 “Target-Text Mediated Interactive Machine Translation.” Machine Translation12(1). 175–194. doi: 10.1023/A:1007999327580
    https://doi.org/10.1023/A:1007999327580 [Google Scholar]
  10. Gehring, Jonas , Michael Auli , David Grangier , Denis Yarats , and Yann N. Dauphin
    2017 “Convolutional Sequence to Sequence Learning.” eprint arXiv:1705.03122 (eprint arXiv: 1705.03122https://arxiv.org/abs/1705.03122).
  11. Haddow, Barry
    2017 Personal communication.
    [Google Scholar]
  12. Hearne, Mary , and Andy Way
    2011 “Statistical Machine Translation: A Guide for Linguists and Translators.” Language and Linguistics Compass5(5). 205–226. doi: 10.1111/j.1749‑818X.2011.00274.x
    https://doi.org/10.1111/j.1749-818X.2011.00274.x [Google Scholar]
  13. Hochreiter, Sepp , and Jürgen Schmidhuber
    1997 “Long short-term memory.” Neural Computation9(8).1735–1780. doi: 10.1162/neco.1997.9.8.1735
    https://doi.org/10.1162/neco.1997.9.8.1735 [Google Scholar]
  14. Koehn, Philipp
    2010Statistical Machine Translation. Cambridge, Mass., USA: MIT Press.
    [Google Scholar]
  15. Levin, Pavel , Nishikant Dhanuka , and Maxim Khalilov
    2017 “Machine Translation at Booking.com: Journey and Lessons Learned.” inThe 20th Annual Conference of the European Association for Machine Translation (29–31 May 2017, Prague, Czech Republic): Conference Booklet, User Studies and Project/Product Descriptions. 81–86.
    [Google Scholar]
  16. Mikolov, Tomas , Kai Chen , Greg Corrado , and Jeffrey Dean
    2013a “Efficient Estimation of Word Representations in Vector Space.” inProceedings of the International Conference on Learning Representations (also available as eprint arXiv: 1301.3781https://arxiv.org/pdf/1301.3781.pdf).
    [Google Scholar]
  17. Mikolov, Tomas , Wen-tau Yih , and Geoffrey Zweig
    2013b “Linguistic Regularities in Continuous Space Word Representations.” inProceedings of NAACL-HLT 2013 (Atlanta, Georgia, 9–14 June 2013), 746–751.
    [Google Scholar]
  18. Papineni, Kishore , Salim Roukos , Todd Ward , and Wei-Jing Zhu
    2002 “BLEU: A Method for Automatic Evaluation of Machine Translation.” Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311–318.
    [Google Scholar]
  19. Peris, Álvaro , Miguel Domingo , and Francisco Casacuberta
    2017 “Interactive Neural Machine Translation.” Computer Speech and Language45, 201–220. doi: 10.1016/j.csl.2016.12.003
    https://doi.org/10.1016/j.csl.2016.12.003 [Google Scholar]
  20. Sennrich, Rico , Barry Haddow , and Alexandra Birch
    2016 “Neural Machine Translation of Rare Words with Subword Units.” inProceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 1715–1725 (Also eprint arXiv: 1508.07909: https://arxiv.org/abs/1508.07909).
    [Google Scholar]
  21. Sennrich, Rico , Orhan Firat , Kyunghyun Cho , Alexandra Birch , Barry Haddow , Julian Hitschler , Marcin Junczys-Dowmunt , Samuel Läubli , Antonio Valerio Miceli Barone , Jozef Mokry , and Maria Nădejde
    2017 “Nematus: A Toolkit for Neural Machine Translation” eprint arXiv:1703.04357 (https://arxiv.org/abs/1703.04357).
  22. Shterionov, Dimitar , Pat Nagle , Laura Casanellas , Riccardo Superbo , and Tony O’Dowd
    2017 “Empirical Evaluation of NMT and PBSMT Quality for Large-Scale Translation Production” inThe 20th Annual Conference of the European Association for Machine Translation (29–31 May 2017, Prague, Czech Republic): Conference Booklet, User Studies and Project/Product Descriptions. 75–80.
    [Google Scholar]
  23. Sutskever, Ilya , Oriol Vinyals , and Quoc V. Le
    2014 “Sequence to Sequence Learning with Neural Networks”, inAdvances in Neural Information Processing Systems, edited by Zoubin Ghahramani , Max Welling , Corinna Cortes , Neil D. Lawrence , and Kilian Q. Weinberger . p.3104–3112.
    [Google Scholar]
  24. Toral, Antonio , and Víctor M. Sánchez-Cartagena
    2017 “A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions” inProceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (Valencia, Spain, April 3–7, 2017), Volume 1, Long Papers. 1063–1073.
    [Google Scholar]
  25. Vashee, Kirti
    2016 “The Google Neural Machine Translation Marketing Deception”, kv-emptypages.blogspot.co.uk/2016/09/the-google-neural-machine-translation.html
  26. Vaswani, Ashish , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N. Gomez , Lukasz Kaiser , and Illia Polosukhin
    2017 “Attention is all you need.” eprint arXiv:1706.03762 (https://arxiv.org/abs/1706.03762).
  27. Way, Andy , and Mary Hearne
    2011 “On the Role of Translations in State-of-the-Art Statistical Machine Translation.” Language and Linguistics Compass5:5, 227–248. doi: 10.1111/j.1749‑818X.2011.00275.x
    https://doi.org/10.1111/j.1749-818X.2011.00275.x [Google Scholar]
  28. Wu, Yonghui , Mike Schuster , Zhifeng Chen , Quoc V. Le , Mohammad Norouzi , Wolfgang Macherey , Maxim Krikun , Yuan Cao , Qin Gao , Klaus Macherey , Jeff Klingner , Apurva Shah , Melvin Johnson , Xiaobing Liu , Łukasz Kaiser , Stephan Gouws , Yoshikiyo Kato , Taku Kudo , Hideto Kazawa , Keith Stevens , George Kurian , Nishant Patil , Wei Wang , Cliff Young , Jason Smith , Jason Riesa , Alex Rudnick , Oriol Vinyals , Greg Corrado , Macduff Hughes , and Jeffrey Dean
    2017 “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”, eprint arXiv:1609.08144 (https://arxiv.org/abs/1609.08144).

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error