Volume 173, Issue 1
  • ISSN 0019-0829
  • E-ISSN: 1783-1490
Buy:$35.00 + Taxes



Most of the texts that second language learners engage with include both text (written and/or spoken) and images. The use of images accompanying texts is believed to support reading comprehension and facilitate learning. Despite their widespread use, very little is known about how the presentation of multiple input sources affects the attentional demands and the underlying cognitive processes involved. This paper provides a review of research on multimodal reading, with a focus on attentional processing. It first introduces the relevant theoretical frameworks and empirical evidence provided in support of the use of pictures in reading. It then reviews studies that have looked at the processing of text and pictures in first and second language contexts. Based on this review, main gaps in research and future research directions are identified. The discussion provided in this paper aims at advancing research on multimodal reading in a second language. Achieving a better understanding of the underlying cognitive processes in multimodal reading is crucial to inform pedagogical practices and to develop theoretical accounts of second language multimodal reading.


Article metrics loading...

Loading full text...

Full text loading...


  1. Abraham, P., & Farías, M.
    (2017) Reading with eyes wide open: Reflections on the impact of multimodal texts on second language reading. Íkala, Revista de Lenguaje y Cultura, 22(1), 57–70.   10.17533/udea.ikala.v22n01a04
    https://doi.org/10.17533/udea.ikala.v22n01a04 [Google Scholar]
  2. Alemdag, E., & Cagiltay, K.
    (2018) A systematic review of eye tracking research on multimedia learning. Computers & Education, 1251, 413–428.   10.1016/j.compedu.2018.06.023
    https://doi.org/10.1016/j.compedu.2018.06.023 [Google Scholar]
  3. Bezemer, J., & Kress, G.
    (2008) Writing in multimodal texts: A social semiotic account of designs for learning. Written Communication, 25(2), 166–195.   10.1177/0741088307313177
    https://doi.org/10.1177/0741088307313177 [Google Scholar]
  4. Bisson, M-J., W. Van Heuven, K. Conklin & R. Tunney
    (2015) The role of verbal and pictorial information in multimodal incidental acquisition of foreign language vocabulary. Quarterly Journal of Experimental Psychology681, 1306–1326.   10.1080/17470218.2014.979211
    https://doi.org/10.1080/17470218.2014.979211 [Google Scholar]
  5. Boerma, I. E., Mol, S. E., & Jolles, J.
    (2016) Reading pictures for story comprehension requires mental imagery skills. Frontiers in Psychology, 7 (1630), 1–10.   10.3389/fpsyg.2016.01630
    https://doi.org/10.3389/fpsyg.2016.01630 [Google Scholar]
  6. Buck, G.
    (2001) Assessing listening. New York, NY: Cambridge University Press. 10.1017/CBO9780511732959
    https://doi.org/10.1017/CBO9780511732959 [Google Scholar]
  7. Center, Y., Freeman, L., Robertson, G., & Outhred, L.
    (1999) The effect of visual imagery training on the reading and listening comprehension of low listening comprehenders in year 2. Journal of Research in Reading221, 241–256.   10.1111/1467‑9817.00088
    https://doi.org/10.1111/1467-9817.00088 [Google Scholar]
  8. Chang, Y., & Choi, S.
    (2014) Effects of seductive details evidenced by gaze duration. Neurobiology of Learning and Memory, 1091, 131–138.   10.1016/j.nlm.2014.01.005
    https://doi.org/10.1016/j.nlm.2014.01.005 [Google Scholar]
  9. Chen, S. C., Hsiao, M. S., & She, H. C.
    (2015) The effects of static versus dynamic 3D representations on 10th grade students’ atomic orbital mental model construction: Evidence from eye movement behaviors. Computers in Human Behavior, 531, 169–180.   10.1016/j.chb.2015.07.003
    https://doi.org/10.1016/j.chb.2015.07.003 [Google Scholar]
  10. Clark, J. M., & Paivio, A.
    (1991) Dual coding theory and education. Educational Psychology Review, 31, 149–210.   10.1007/BF01320076
    https://doi.org/10.1007/BF01320076 [Google Scholar]
  11. Eitel, A., & Scheiter, K.
    (2015) Picture or text first? Explaining sequence effects when learning with pictures and text. Educational Psychology Review, 271, 153–180.   10.1007/s10648‑014‑9264‑4
    https://doi.org/10.1007/s10648-014-9264-4 [Google Scholar]
  12. Elley, W. B. & F. Mangubhai
    (1983) The impact of reading on second language learning. Reading Research Quarterly19(1), 53–67.   10.2307/747337
    https://doi.org/10.2307/747337 [Google Scholar]
  13. Evans, M. A. & J. Saint-Aubin
    (2005) What children are looking at during shared storybook reading: Evidence from eye movement monitoring. Psychological Science161, 913–920.   10.1111/j.1467‑9280.2005.01636.x
    https://doi.org/10.1111/j.1467-9280.2005.01636.x [Google Scholar]
  14. Grabe, W.
    (2006) Areas of research that influence L2 reading instruction. InUsó-Juan, E. & Martínez-Flor, A. (Eds.), Current trends in the development and teaching of the four language skills (pp.279–301). New York: Mouton de Gruyter. 10.1515/9783110197778.4.279
    https://doi.org/10.1515/9783110197778.4.279 [Google Scholar]
  15. (2009) Reading in a second language: Moving from theory to practice. Cambridge: Cambridge University Press.
    [Google Scholar]
  16. Godfroid, A., Boers, F., & Housen, A.
    (2013) An eye for words: Gauging the role of attention in incidental L2 vocabulary acquisition by means of eye-tracking. Studies in Second Language Acquisition, 35(3), 483–517.   10.1017/S0272263113000119
    https://doi.org/10.1017/S0272263113000119 [Google Scholar]
  17. Guijarro, A. J. M., & Sanz, M. J. P.
    (2009) In interaction of image and verbal text in a picture book. A multimodal and systemic functional study. InVentola, E., & Guijarro, A. J. M. (Eds.), The world told and the world shown (pp.107–123). London: Palgrave Macmillan.
    [Google Scholar]
  18. Hannus, M., & Hyönä, J.
    (1999) Utilization of illustrations during learning of science textbook passages among low- and high-ability children. Contemporary Educational Psychology, 24(2), 95–123.   10.1006/ceps.1998.0987
    https://doi.org/10.1006/ceps.1998.0987 [Google Scholar]
  19. Hill, D.
    (2013) Graded readers. ELT Journal67(1), 85–125.   10.1093/elt/ccs067
    https://doi.org/10.1093/elt/ccs067 [Google Scholar]
  20. Holsanova, J.
    (2014) Reception of multimodality: Applying eye tracking methodology in multimodal research. InC. Jewitt (Ed.), The Routledge Handbook of Multimodal Analysis (pp.287–298). New York: Routledge.
    [Google Scholar]
  21. Johnson, C. I., & Mayer, R. E.
    (2012) An eye movement analysis of the spatial contiguity effect in multimedia learning. Journal of Experimental Psychology: Applied, 181, 178–191.   10.1037/a0026923
    https://doi.org/10.1037/a0026923 [Google Scholar]
  22. Justice, L. M., Skibbe, L., Canning, A., & Lankford, C.
    (2005) Pre-schoolers, print and storybooks: An observational study using eye movement analysis. Journal of Research in Reading, 281, 229–243.   10.1111/j.1467‑9817.2005.00267.x
    https://doi.org/10.1111/j.1467-9817.2005.00267.x [Google Scholar]
  23. Kress, G.
    (2000) Multimodality: A social semiotic approach to communication. London: Routledge.
    [Google Scholar]
  24. Kress, G. & van Leeuwen, T.
    (2021) Reading images: The grammar of visual design. New York: Routledge.
    [Google Scholar]
  25. Lee, H., & Mayer, R.
    (2018) Fostering learning from instructional video in a second language. Applied Cognitive Psychology, 321, 648–654.   10.1002/acp.3436
    https://doi.org/10.1002/acp.3436 [Google Scholar]
  26. Lee, M., & Révész, A.
    (2018) Promoting grammatical development through textually enhanced captions: An eye-tracking study. The Modern Language Journal, 1021, 557–577.   10.1111/modl.12503
    https://doi.org/10.1111/modl.12503 [Google Scholar]
  27. Makransky, G., Terkildsen, T. S., & Mayer, R. E.
    (2019) Role of subjective and objective measures of cognitive processing during learning in explaining the spatial contiguity effect. Learning and Instruction, 611, 23–34.   10.1016/j.learninstruc.2018.12.001
    https://doi.org/10.1016/j.learninstruc.2018.12.001 [Google Scholar]
  28. Mason, L., Pluchino, P., Tornatora, M. C., & Ariasi, N.
    (2013a) An eye-tracking study of learning science text with concrete and abstract illustrations. Journal of Experimental Education, 811, 356–384.   10.1080/00220973.2012.727885
    https://doi.org/10.1080/00220973.2012.727885 [Google Scholar]
  29. Mason, L., Tornatora, M. C., & Pluchino, P.
    (2013b) Do fourth graders integrate text and picture in processing and learning from an illustrated science text? Evidence from eye-movement patterns. Computers and Education, 601, 95–109.   10.1016/j.compedu.2012.07.011
    https://doi.org/10.1016/j.compedu.2012.07.011 [Google Scholar]
  30. (2015) Integrative processing of verbal and graphical information during re-reading predicts learning from illustrated text: An eye movement study. Reading and Writing, 281, 851–872.   10.1007/s11145‑015‑9552‑5
    https://doi.org/10.1007/s11145-015-9552-5 [Google Scholar]
  31. Mayer, R. E.
    (2001) Multimedia learning. Cambridge: Cambridge University Press. 10.1017/CBO9781139164603
    https://doi.org/10.1017/CBO9781139164603 [Google Scholar]
  32. (2009) Multimedia learning (2nd ed.). Cambridge: Cambridge University Press. 10.1017/CBO9780511811678
    https://doi.org/10.1017/CBO9780511811678 [Google Scholar]
  33. (2014a) Introduction to multimedia learning. InR. E. Mayer (Ed.), The Cambridge handbook of multimedia learning (2nd ed.) (pp.1–24). Cambridge: Cambridge University Press. 10.1017/CBO9781139547369.002
    https://doi.org/10.1017/CBO9781139547369.002 [Google Scholar]
  34. Mayer, R.
    (2014b) Cognitive theory of multimedia learning. InR. Mayer (Ed.), The Cambridge handbook of multimedia learning (pp.43–71). Cambridge: Cambridge University Press. 10.1017/CBO9781139547369.005
    https://doi.org/10.1017/CBO9781139547369.005 [Google Scholar]
  35. Mayer, R. E.
    (2021) Multimedia learning (3rd ed.). Cambridge: Cambridge University Press.
    [Google Scholar]
  36. Mayer, R. E., & Anderson, R. B.
    (1991) Animations need narrations: An experimental test of a dual-coding hypothesis. Journal of Educational Psychology, 83(4), 484–490.   10.1037/0022‑0663.83.4.484
    https://doi.org/10.1037/0022-0663.83.4.484 [Google Scholar]
  37. Mayer, R. E., Howarth, J. T., Kaplan, M., & Hanna, S.
    (2018) Applying the segmenting principle to online geography slideshow lessons. Educational Technology Research and Development, 66(3), 563–577.   10.1007/s11423‑017‑9554‑x
    https://doi.org/10.1007/s11423-017-9554-x [Google Scholar]
  38. Mayer, R. E., & Moreno, R.
    (1998) A split-attention effect in multimedia learning: Evidence for dual processing systems in working memory. Journal of Educational Psychology, 90(2), 312–320.   10.1037/0022‑0663.90.2.312
    https://doi.org/10.1037/0022-0663.90.2.312 [Google Scholar]
  39. Mills, K., & Unsworth, L.
    (2017) Multimodal literacy. Oxford research encyclopedia of education. Retrieved22 Jul. 2021, from: https://oxfordre.com/education/view/10.1093/acrefore/9780190264093.001.0001/acrefore-9780190264093-e-232
    [Google Scholar]
  40. Montero Perez, M.
    (2020) Multimodal input in SLA research. Studies in Second Language Acquisition, 42(3), 653–663.   10.1017/S0272263120000145
    https://doi.org/10.1017/S0272263120000145 [Google Scholar]
  41. Montero Perez, M., Van Den Noortgate, W., & Desmet, P.
    (2013) Captioned video for L2 listening and vocabulary learning: A meta-analysis. System, 411, 720–739.   10.1016/j.system.2013.07.013
    https://doi.org/10.1016/j.system.2013.07.013 [Google Scholar]
  42. Montero Perez, M., Peters, E., & Desmet, P.
    (2015) Enhancing vocabulary learning through captioned video: an eye-tracking study. The Modern Language Journal, 99(2), 308–28.   10.1111/modl.12215
    https://doi.org/10.1111/modl.12215 [Google Scholar]
  43. Moreno, R., & Mayer, R. E.
    (1999) Multimedia-supported metaphors for meaning making in mathematics. Cognition and Instruction, 17(3), 215–248.   10.1207/S1532690XCI1703_1
    https://doi.org/10.1207/S1532690XCI1703_1 [Google Scholar]
  44. (2002a) Learning science in virtual reality multimedia environments: Role of methods and media. Journal of Educational Psychology, 94(3), 598–610.   10.1037/0022‑0663.94.3.598
    https://doi.org/10.1037/0022-0663.94.3.598 [Google Scholar]
  45. (2002b) Verbal redundancy in multimedia learning: When reading helps listening. Journal of Educational Psychology, 94(1), 156–163.   10.1037/0022‑0663.94.1.156
    https://doi.org/10.1037/0022-0663.94.1.156 [Google Scholar]
  46. Omaggio, A. C.
    (1979) Pictures and second language comprehension: Do they help?Foreign Language Annals, 121, 107–116.   10.1111/j.1944‑9720.1979.tb00153.x
    https://doi.org/10.1111/j.1944-9720.1979.tb00153.x [Google Scholar]
  47. Paivio, A.
    (1986) Mental representations: A dual coding approach. Oxford: Oxford University Press.
    [Google Scholar]
  48. (2006) Mind and its evolution: A dual coding approach. Lawrence Erlbaum.
    [Google Scholar]
  49. Pellicer-Sánchez, A.
    (2016) Incidental L2 vocabulary acquisition from and while reading: An eye-tracking study. Studies in Second Language Acquisition, 381, 97–130.   10.1017/S0272263115000224
    https://doi.org/10.1017/S0272263115000224 [Google Scholar]
  50. Pellicer-Sánchez, A., Conklin, K., Rodgers, M. P., & Parente, F.
    (2021), The Effect of Auditory Input on Multimodal Reading Comprehension: An Examination of Adult Readers’ Eye Movements. The Modern Language Journal, 105(4), 936–956:   10.1111/modl.12743
    https://doi.org/10.1111/modl.12743 [Google Scholar]
  51. Pellicer-Sánchez, A., Tragant, E., Conklin, K., Rodgers, M., Llanes, A., & Serrano, R.
    (2018) L2 reading and reading-while-listening in multimodal learning conditions: An eye-tracking study. ELT Research Papers, 18(01), 1–28. London: British Council.
    [Google Scholar]
  52. Pellicer-Sánchez, A., Tragant, E., Conklin, K., Rodgers, M., Serrano, R., & Llanes, A.
    (2020) Young learners’ processing of multimodal input and its impact on reading comprehension: An eye-tracking study. Studies in Second Language Acquisition, 42(3), 577–598.   10.1017/S0272263120000091
    https://doi.org/10.1017/S0272263120000091 [Google Scholar]
  53. Peters, E., & Muñoz, C.
    (2020) Introduction to the special issue: Language learning from multimodal input. Studies in Second Language Acquisition, 42(3), 489–497.   10.1017/S0272263120000212
    https://doi.org/10.1017/S0272263120000212 [Google Scholar]
  54. Plass, J. L., Chun, D. M., Mayer, R. E., & Leutner, D.
    (1998) Supporting visual and verbal learning preferences in a second language multimedia learning environment. Journal of Educational Psychology, 901, 25–36.   10.1037/0022‑0663.90.1.25
    https://doi.org/10.1037/0022-0663.90.1.25 [Google Scholar]
  55. Puimège, E., Montero Perez, M., & Peters, E.
    (2021) Promoting L2 acquisition of multiword units through textually enhanced audiovisual input: An eye-tracking study. Second Language Research. Online First.   10.1177/02676583211049741
    https://doi.org/10.1177/02676583211049741 [Google Scholar]
  56. Rodgers, M. P. H., & Webb, S.
    (2017) The effects of captions on EFL learners’ comprehension of English-language television programs. Calico Journal, 341, 20–38.   10.1558/cj.29522
    https://doi.org/10.1558/cj.29522 [Google Scholar]
  57. Rodgers, M. P. H. & Webb, S.
    (2019) Incidental vocabulary learning through viewing television. ITL-International Journal of Applied Linguistics, 171(2), 191–220.   10.1075/itl.18034.rod
    https://doi.org/10.1075/itl.18034.rod [Google Scholar]
  58. Roy-Charland, A., Saint-Aubin, J., & Evans, M. A.
    (2007) Eye movements in shared book reading with children from kindergarten to grade 4. Reading and Writing, 201, 909–931.   10.1007/s11145‑007‑9059‑9
    https://doi.org/10.1007/s11145-007-9059-9 [Google Scholar]
  59. Schmidt-Weigand, F.
    (2011) Does animation amplify the modality effect–or is there any modality effect at all?Zeitschrift für Padagogische Psychologie, 25(4), 245–256.   10.1024/1010‑0652/a000048
    https://doi.org/10.1024/1010-0652/a000048 [Google Scholar]
  60. Schmidt-Weigand, F., Kohnert, A., & Glowalla, U.
    (2010) A closer look at split visual attention in system-and self-paced instruction in multimedia learning. Learning and Instruction, 20(2), 100–110.   10.1016/j.learninstruc.2009.02.011
    https://doi.org/10.1016/j.learninstruc.2009.02.011 [Google Scholar]
  61. Schnotz, W., & Baadte, C.
    (2008) Domain learning versus language learning with multimedia. InM. Farías & K. Obilinovic (Eds.), Aprendizaje multimodal/Multimodal learning (pp.21–49). Santiago de Chile: Publifahu USACH.
    [Google Scholar]
  62. Seburn, T.
    (2017) Learner-sourced visuals for deeper text engagement and conceptual comprehension. InK. Donaghy & D. Xerri (Eds.), The image in English language teaching (pp.79–88). Malta: Gutenberg Press.
    [Google Scholar]
  63. Serrano, R., & Pellicer-Sánchez, A.
    (2019) Young L2 learners’ online processing of information in a graded reader during reading-only and reading-while-listening conditions: A study of eye movements. Applied Linguistics Review. Advance online publication.   10.1515/applirev‑2018‑0102
    https://doi.org/10.1515/applirev-2018-0102 [Google Scholar]
  64. Takacs, Z. K., & Bus, A. G.
    (2016) Benefits of motion in animated storybooks for children’s visual attention and story comprehension. An eye-tracking study. Frontiers in Psychology, 71, 1–12.   10.3389/fpsyg.2016.01591
    https://doi.org/10.3389/fpsyg.2016.01591 [Google Scholar]
  65. Tang, G.
    (1992) The effect of graphic representation of knowledge structures on ESL reading comprehension. Studies in Second Language Acquisition, 141, 177–195. 10.1017/S0272263100010810
    https://doi.org/10.1017/S0272263100010810 [Google Scholar]
  66. Tragant, E., & Pellicer-Sánchez, A.
    (2019) Young learners’ engagement with multimodal exposure: An eye tracking study. System, 801, 212–223.   10.1016/j.system.2018.12.002
    https://doi.org/10.1016/j.system.2018.12.002 [Google Scholar]
  67. Unsworth, L.
    (2006) Towards a metalanguage for multiliteracies education: Describing the meaning – making resources of language-image interaction. English Teaching, 5(1), 55–76.
    [Google Scholar]
  68. (2014) Multimodal reading comprehension: Curriculum expectations and large-scale literacy testing practices. Pedagogies: An International Journal, 9(1), 26–44.   10.1080/1554480X.2014.878968
    https://doi.org/10.1080/1554480X.2014.878968 [Google Scholar]
  69. Winke, P., Gass, S., & Sydorenko, T.
    (2010) The effects of captioning videos used for foreign language listening activities. Language Learning & Technology, 141, 65–86. doi:10125/44203
    [Google Scholar]
  70. Wright, A.
    (2010) Pictures for language learning. Cambridge: Cambridge University Press.
    [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error