1887
image of Exploring large language models for L2 metaphonological awareness tutoring
USD
Buy:$35.00 + Taxes

Abstract

Abstract

This is the first observational study to evaluate the feasibility of implementing large language models (LLMs) for second language (L2) metaphonological awareness training. A custom implementation of GPT-4 acting as a homework tutor was piloted in an English phonetics and phonology course for first-year university students. Two novel homework assignments were designed to leverage the LLM’s strengths and explore its weaknesses. Analysis of learner interaction logs, homework reflections, and survey data revealed that most learners perceived the AI Tutor as helpful for its personalised explanations. However, the overall sentiment was mixed due to the LLM’s propensity for confabulation. Despite these challenges, the pilot demonstrated the potential for LLMs to engage learners in active and self-regulated learning. Recommendations for future directions include designing LLM-based learning environments, promoting AI literacy among educators and learners, and experimentally researching long-term effects of AI tutors on learning outcomes.

Loading

Article metrics loading...

/content/journals/10.1075/jslp.24030.lod
2025-06-02
2025-06-24
Loading full text...

Full text loading...

References

  1. Anderson, L. W., Krathwohl, D. R., Airasian, P. W., Cruikshank, K. A., Mayer, R. E., Pintrich, P. R., Raths, J., & Wittrock, M. C.
    (2001) A taxonomy for learning, teaching and assessing: A revision of Bloom’s taxonomy of educational objectives. New York: Addison Wesley Longman.
    [Google Scholar]
  2. Anthropic
    Anthropic (2023, June14). Claude 2.1: A milestone in helpful and honest AI. Claude by Anthropic. https://www.anthropic.com/news/claude-2-1
  3. Azevedo, R., Taub, M., & Mudrick, N. V.
    (2018) Understanding and reasoning about real-time cognitive, affective, and metacognitive processes to foster self-regulation with advanced learning technologies. InD. H. Schunk & J. A. Greene (Eds.), Handbook of self-regulation of learning and performance (2nd ed., pp.–). Routledge/Taylor & Francis Group. 10.4324/9781315697048‑17
    https://doi.org/10.4324/9781315697048-17 [Google Scholar]
  4. Becker, K., & Edalatishams, I.
    (2019) ELSA Speak — Accent reduction. InJ. Levis, C. Nagle, & E. Todey (Eds.), Proceedings of the 10th pronunciation in second language learning and teaching conference (pp.–). Iowa State University.
    [Google Scholar]
  5. Beguš, G., Dąbkowski, M., & Rhodes, R.
    (2023) Large linguistic models: Analyzing theoretical linguistic abilities of LLMs. arXiv. 10.48550/arxiv.2305.00948
    https://doi.org/10.48550/arxiv.2305.00948 [Google Scholar]
  6. Bernstein, J.
    (1999) PhonePass testing: Structure and construct. Menlo Park, CA: Ordinate Corporation.
    [Google Scholar]
  7. Bjork, R. A., Dunlosky, J., & Kornell, N.
    (2013) Self-regulated learning: Beliefs, techniques, and illusions. Annual Review of Psychology, , –. 10.1146/annurev‑psych‑113011‑143823
    https://doi.org/10.1146/annurev-psych-113011-143823 [Google Scholar]
  8. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D.
    (2020) Language models are few-shot learners. arXiv. 10.48550/arxiv.2005.14165
    https://doi.org/10.48550/arxiv.2005.14165 [Google Scholar]
  9. Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M. T., & Zhang, Y.
    (2023) Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv. 10.48550/arxiv.2303.12712
    https://doi.org/10.48550/arxiv.2303.12712 [Google Scholar]
  10. Carlet, A., & Kivistö-de Souza, H.
    (2018) Improving L2 pronunciation inside and outside the classroom: Perception, production and autonomous learning of L2 vowels. Ilha do Desterro, (), –. 10.5007/2175‑8026.2018v71n3p99
    https://doi.org/10.5007/2175-8026.2018v71n3p99 [Google Scholar]
  11. Chao, P.-J., Hsu, T.-H., Liu, T.-P., & Cheng, Y.-H.
    (2021) Knowledge of and competence in artificial intelligence: Perspectives of Vietnamese digital-native students. IEEE Access, , –. 10.1109/ACCESS.2021.3081749
    https://doi.org/10.1109/ACCESS.2021.3081749 [Google Scholar]
  12. Chi, M. T. H., & Wylie, R.
    (2014) The ICAP framework: Linking cognitive engagement to active learning outcomes. Educational Psychologist, (), –. 10.1080/00461520.2014.965823
    https://doi.org/10.1080/00461520.2014.965823 [Google Scholar]
  13. Chun, D. M.
    (2023) ELSA: English Language Speech Assistant. Journal of Second Language Pronunciation. 10.1075/jslp.23009.chu
    https://doi.org/10.1075/jslp.23009.chu [Google Scholar]
  14. Collins, B., Mees, I. M., & Carley, P.
    (2019) Practical English phonetics and phonology: A resource book for students (4th ed.). Routledge. 10.4324/9780429490392
    https://doi.org/10.4324/9780429490392 [Google Scholar]
  15. Cope, B., & Kalantzis, M.
    (2023) A multimodal grammar of artificial intelligence: Measuring the gains and losses in generative AI. Multimodality & Society, (). 10.1177/26349795231221699
    https://doi.org/10.1177/26349795231221699 [Google Scholar]
  16. Coulange, S.
    (2023) Computer-aided pronunciation training in 2022: When pedagogy struggles to catch up. InA. Henderson & A. Kirkova-Naskova (Eds.), Proceedings of the 7th International Conference on English Pronunciation: Issues and Practices (pp.–). 10.5281/zenodo.8137754
    https://doi.org/10.5281/zenodo.8137754 [Google Scholar]
  17. Couper, G.
    (2022) Teaching and testing perception of word stress: Many shades of perception. InJ. Levis & A. Guskaroska (Eds.), Proceedings of the 12th Pronunciation in Second Language Learning and Teaching Conference. 10.31274/psllt.13266
    https://doi.org/10.31274/psllt.13266 [Google Scholar]
  18. Cruttenden, A.
    (2014) Gimson’s pronunciation of English (8th ed.). Routledge. 10.4324/9780203784969
    https://doi.org/10.4324/9780203784969 [Google Scholar]
  19. D’Mello, S. K., & Graesser, A. C.
    (2023) Intelligent tutoring systems: ‘. InP. A. Schutz & K. R. Muis (Eds.), Handbook of educational psychology (4th ed.). Routledge. 10.4324/9780429433726‑31
    https://doi.org/10.4324/9780429433726-31 [Google Scholar]
  20. Duijn, M. van, Dijk, B. van, Kouwenhoven, T., Valk, W. de, Spruit, M., & van der Putten, P.
    (2023) Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7–10 on Advanced Tests. Proceedings of the 27th Conference on Computational Natural Language Learning, –. 10.18653/v1/2023.conll‑1.25
    https://doi.org/10.18653/v1/2023.conll-1.25 [Google Scholar]
  21. Ellis, R.
    (2004) The definition and measurement of L2 explicit knowledge. Language Learning, (), –. 10.1111/j.1467‑9922.2004.00255.x
    https://doi.org/10.1111/j.1467-9922.2004.00255.x [Google Scholar]
  22. (2005) Principles of instructed language learning. System, (), –. 10.1016/j.system.2004.12.006
    https://doi.org/10.1016/j.system.2004.12.006 [Google Scholar]
  23. Fu, K., Peng, L., Yang, N., & Zhou, S.
    (2024) Pronunciation assessment with multi-modal large language models. arXiv. https://arxiv.org/abs/2407.09209
    [Google Scholar]
  24. Fu, Y., Song, H., Lee, S., Ma, D., Chi, E. H., Yang, L., & Zhang, C.
    (2023) Why can GPT-4 fail simple reasoning problems? A comprehensive evaluation of reasoning failures. arXiv. 10.48550/arXiv.2412.18626
    https://doi.org/10.48550/arXiv.2412.18626 [Google Scholar]
  25. Garrett, N.
    (2009) Technology in the service of language learning: Trends and issues. The Modern Language Journal, : –. 10.1111/j.1540‑4781.2009.00968.x
    https://doi.org/10.1111/j.1540-4781.2009.00968.x [Google Scholar]
  26. Gartner
    Gartner (2023) Hype cycle for artificial intelligence, 2023. RetrievedNovember 25, 2023, fromhttps://www.gartner.com/doc/reprints?id=1-2EIO7DCA&ct=230721&st=sb
  27. Gonet, W., & Stadnicka, L.
    (2005) Vowel clipping in English. Speech and Language Technology, –.
    [Google Scholar]
  28. Graesser, A. C., Person, N. K., & Magliano, J. P.
    (1995) Collaborative dialogue patterns in naturalistic one-to-one tutoring. Applied Cognitive Psychology, (), –. 10.1002/acp.2350090604
    https://doi.org/10.1002/acp.2350090604 [Google Scholar]
  29. Graesser, A. C., Conley, M. W., & Olney, A.
    (2012) Intelligent tutoring systems. InK. R. Harris, S. Graham, T. Urdan, A. G. Bus, S. Major, & H. L. Swanson (Eds.), APA educational psychology handbook, Vol. 3. Application to learning and teaching (pp.–). American Psychological Association. 10.1037/13275‑018
    https://doi.org/10.1037/13275-018 [Google Scholar]
  30. Harnad, S.
    (2024) Language writ large: LLMs, ChatGPT, grounding, meaning and understanding. arXiv. 10.31234/osf.io/ch2wx
    https://doi.org/10.31234/osf.io/ch2wx [Google Scholar]
  31. Hunter, J. D.
    (2007) Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering, (), –. 10.1109/MCSE.2007.55
    https://doi.org/10.1109/MCSE.2007.55 [Google Scholar]
  32. Jekiel, M.
    (2022) L2 rhythm production and musical rhythm perception in advanced learners of English. Poznań Studies in Contemporary Linguistics, (), –. 10.1515/psicl‑2022‑0016
    https://doi.org/10.1515/psicl-2022-0016 [Google Scholar]
  33. Jekiel, M., & Malarski, K.
    (2021) Musical hearing and musical experience in second language English vowel acquisition. Journal of Speech, Language, and Hearing Research, (), –. 10.1044/2021_JSLHR‑19‑00253
    https://doi.org/10.1044/2021_JSLHR-19-00253 [Google Scholar]
  34. Jeon, J., & Lee, S.
    (2023) Large language models in education: A focus on the complementary relationship between human teachers and ChatGPT. Education and Information Technologies, (), –. 10.1007/s10639‑023‑11834‑1
    https://doi.org/10.1007/s10639-023-11834-1 [Google Scholar]
  35. Jeon, J., Lee, S., & Choe, H.
    (2023) Beyond ChatGPT: A conceptual framework and systematic review of speech-recognition chatbots for language learning. Computers & Education, , . 10.1016/j.compedu.2023.104898
    https://doi.org/10.1016/j.compedu.2023.104898 [Google Scholar]
  36. Kheiri, K., & Karimi, H.
    (2023) SentimentGPT: Exploiting GPT for advanced sentiment analysis and its departure from current machine learning. arXiv. 10.48550/arxiv.2307.10234
    https://doi.org/10.48550/arxiv.2307.10234 [Google Scholar]
  37. Kosinski, M.
    (2023) Theory of Mind Might Have Spontaneously Emerged in Large Language Models. arXiv. 10.48550/arxiv.2302.02083
    https://doi.org/10.48550/arxiv.2302.02083 [Google Scholar]
  38. Kulik, J. A., & Fletcher, J. D.
    (2016) Effectiveness of ITSs: A meta-analytic review. Review of Educational Research, (), –. 10.3102/0034654315581420
    https://doi.org/10.3102/0034654315581420 [Google Scholar]
  39. Lombardi, D., Shipley, T. F., Bailey, J. M., Bretones, P. S., Prather, E. E., Ballen, C. J., Knight, J. K., Smith, M. K., Stowe, R. L., Cooper, M. M., Prince, M., Atit, K., Uttal, D. H., LaDue, N. D., McNeal, P. M., Ryker, K., St. John, K., van der Hoeven Kraft, K. J., & Docktor, J. L.
    (2021) The curious construct of active learning. Psychological Science in the Public Interest, (), –. 10.1177/1529100620973974
    https://doi.org/10.1177/1529100620973974 [Google Scholar]
  40. Łodzikowski, K.
    (2021) Association between allophonic transcription tool use and phonological awareness level. Language Learning and Technology, (), –. hdl.handle.net/10125/44748
    [Google Scholar]
  41. Łodzikowski, K., Foltz, P. W., & Behrens, J. T.
    (2023) Generative AI and its educational implications. arXiv. 10.48550/arXiv.2401.08659
    https://doi.org/10.48550/arXiv.2401.08659 [Google Scholar]
  42. Liu, R., Zenke, C., Liu, C., Holmes, A., Thornton, P., & Malan, D. J.
    (2024) Teaching CS50 with AI: Leveraging generative artificial intelligence in computer science education. InProceedings of the 55th ACM Technical Symposium on Computer Science Education — SIGCSE 2024 (Vol., pp.–). Portland: ACM. 10.1145/3626252.3630938
    https://doi.org/10.1145/3626252.3630938 [Google Scholar]
  43. McKinney, W.
    (2010) Data structures for statistical computing in Python. InS. van der Walt & J. Millman (Eds.), Proceedings of the 9th Python in Science Conference (pp.–). 10.25080/Majora‑92bf1922‑00a
    https://doi.org/10.25080/Majora-92bf1922-00a [Google Scholar]
  44. Merriënboer, J. G., & Sweller, J.
    (2005) Cognitive load theory and complex learning: Recent developments and future directions. Educational Psychology Review, (), –. 10.1007/s10648‑005‑3951‑0
    https://doi.org/10.1007/s10648-005-3951-0 [Google Scholar]
  45. Mollick, E. R., & Mollick, L.
    (2024) Instructors as innovators: A future-focused approach to new AI learning opportunities, with prompts. The Wharton School Research Paper. 10.2139/ssrn.4802463
    https://doi.org/10.2139/ssrn.4802463 [Google Scholar]
  46. Mompeán, J. A.
    (2024) ChatGPT for L2 pronunciation teaching and learning. ELT Journal, (), –. 10.1093/elt/ccae050
    https://doi.org/10.1093/elt/ccae050 [Google Scholar]
  47. Nowacka, M.
    (2022) English phonetics course: University students’ preferences and expectations. Research in Language, (), –. 10.18778/1731‑7533.20.1.05
    https://doi.org/10.18778/1731-7533.20.1.05 [Google Scholar]
  48. NYU Center for Mind, Brain and Consciousness
    NYU Center for Mind, Brain and Consciousness (2023, April6). Debate: Do language models need sensory grounding for meaning and understanding? [Video]. YouTube. https://www.youtube.com/watch?v=x10964w00zk
    [Google Scholar]
  49. Organisation for Economic Co-operation and Development
    Organisation for Economic Co-operation and Development (2024) Explanatory memorandum on the updated OECD definition of an AI system (OECD Artificial Intelligence Papers, No. 8). OECD Publishing. 10.1787/623da898‑en
    https://doi.org/10.1787/623da898-en [Google Scholar]
  50. OpenAI
    OpenAI (2022, November30). Introducing ChatGPT. https://openai.com/blog/chatgpt
  51. OpenAI
    OpenAI (2023a) GPT-3.5 Turbo [Large language model]. OpenAI Blog. https://platform.openai.com/docs/models/#gpt-3-5-turbo
    [Google Scholar]
  52. OpenAI
    OpenAI (2023b) GPT-4 technical report. arXiv. 10.48550/arXiv.2303.08774
    https://doi.org/10.48550/arXiv.2303.08774 [Google Scholar]
  53. OpenAI
    OpenAI (2024) GPT-4o [Large language model]. https://openai.com/index/hello-gpt-4o/
  54. Pea, R. D.
    (2004) The social and technological dimensions of scaffolding and related theoretical concepts for learning, education, and human activity. Journal of the Learning Sciences, (), –. 10.1207/s15327809jls1303_6
    https://doi.org/10.1207/s15327809jls1303_6 [Google Scholar]
  55. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … & Duchesnay, E.
    (2011) Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, , –.
    [Google Scholar]
  56. Pennington, M. C., & Rogerson-Revell, P.
    (2019) Using technology for pronunciation teaching, learning, and assessment. InM. C. Pennington & P. Rogerson-Revell (Eds.), English pronunciation teaching and research (pp.–). Palgrave Macmillan. 10.1057/978‑1‑137‑47677‑7_5
    https://doi.org/10.1057/978-1-137-47677-7_5 [Google Scholar]
  57. Python Software Foundation
    Python Software Foundation (2023) Python (Version 3.12.1). https://www.python.org
    [Google Scholar]
  58. Reid, L., Button, D., & Brommeyer, M.
    (2023) Challenging the myth of the digital native: A narrative review. Nursing Reports, (), –. 10.3390/nursrep13020052
    https://doi.org/10.3390/nursrep13020052 [Google Scholar]
  59. Rojczyk, A., & Porzuczek, A.
    (2012) Selected aspects in the acquisition of English phonology by Polish learners — Segments and prosody. InD. Gabryś-Barker (Ed.), Readings in second language acquisition (pp.–). Katowice: Uniwersytet Śląski.
    [Google Scholar]
  60. Schmidt, R. W.
    (1990) The role of consciousness in second language learning. Applied Linguistics, (), –. 10.1093/applin/11.2.129
    https://doi.org/10.1093/applin/11.2.129 [Google Scholar]
  61. Shaikh, S., Yayilgan, S. Y., Klimova, B., & Pikhart, M.
    (2023) Assessing the usability of ChatGPT for formal English language learning. European Journal of Investigation in Health, Psychology and Education, (), –. 10.3390/ejihpe13090140
    https://doi.org/10.3390/ejihpe13090140 [Google Scholar]
  62. Sharma, P., & Hannafin, M. J.
    (2007) Scaffolding in technology-enhanced learning environments. Interactive Learning Environments, (), –. 10.1080/10494820600996972
    https://doi.org/10.1080/10494820600996972 [Google Scholar]
  63. Sim, A., Wang, Y., Chan, T. S., & Huang, Y.
    (2024) Evaluating the generation of spatial relations in text and image generative models. arXiv Preprint, arXiv:2411.07664. 10.48550/arXiv.2411.07664
    https://doi.org/10.48550/arXiv.2411.07664 [Google Scholar]
  64. Simmering, P. F., & Huoviala, P.
    (2023) Large language models for aspect-based sentiment analysis. arXiv. 10.48550/arxiv.2310.18025
    https://doi.org/10.48550/arxiv.2310.18025 [Google Scholar]
  65. Sobkowiak, W.
    (2004) English phonetics for Poles (3rd ed.). Poznań: Wydawnictwo Poznańskie.
    [Google Scholar]
  66. Stockwell, G., & Wang, Y.
    (2023) Exploring the challenges of technology in language teaching in the aftermath of the pandemic. RELC Journal, (). 10.1177/00336882231168438
    https://doi.org/10.1177/00336882231168438 [Google Scholar]
  67. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G.
    (2023) LLaMA: Open and efficient foundation language models. arXiv. 10.48550/arxiv.2302.13971
    https://doi.org/10.48550/arxiv.2302.13971 [Google Scholar]
  68. Tunstall, L., Beeching, E., Lambert, N., Rajani, N., Rasul, K., Belkada, Y., Huang, S., Werra, L. von, Fourrier, C., Habib, N., Sarrazin, N., Sanseviero, O., Rush, A. M., & Wolf, T.
    (2023) Zephyr: Direct distillation of LM alignment. arXiv. 10.48550/arxiv.2310.16944
    https://doi.org/10.48550/arxiv.2310.16944 [Google Scholar]
  69. Weckwerth, J.
    (2011) English TRAP vowel in advanced Polish learners: Variation and system typology. InW.-S. Lee & E. Zee (Eds.), Proceedings of the 17th International Congress of Phonetic Sciences (pp.–). Hong Kong: City University of Hong Kong.
    [Google Scholar]
  70. Winne, P. H.
    (2021) Open learner models working in symbiosis with self-regulating learners: A research agenda. International Journal of Artificial Intelligence in Education, , –. 10.1007/s40593‑020‑00212‑4
    https://doi.org/10.1007/s40593-020-00212-4 [Google Scholar]
  71. Wrembel, M.
    (2011) Cross-modal reinforcements in phonetics teaching and learning: An overview of innovative trends in pronunciation pedagogy. InW. S. Lee & E. Zee (Eds.), Proceedings of the 17th International Congress of Phonetic Sciences (pp.–). City University of Hong Kong.
    [Google Scholar]
  72. Yan, D., Rupp, A. A., & Foltz, P. W.
    (Eds.) (2020) Handbook of automated scoring: Theory into practice. CRC Press. 10.1201/9781351264808
    https://doi.org/10.1201/9781351264808 [Google Scholar]
  73. Zheng, Y., & De Jong, J. H. A. L.
    (2011) Research note: Establishing construct and concurrent validity of Pearson Test of English Academic. Pearson. Retrieved fromhttps://bit.ly/45SbrzC
    [Google Scholar]
/content/journals/10.1075/jslp.24030.lod
Loading
/content/journals/10.1075/jslp.24030.lod
Loading

Data & Media loading...

  • Article Type: Research Article
Keywords: AI ; CAPT ; L2 pronunciation. ; artificial intelligence
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error