-
oa An empirical study on GenAI use in speech difficulty evaluation
Toward a human-centered application of AI in interpreting education
- Source: InContext, Volume 5, Issue 1, May 2025, p. 116 - 145
-
- 31 May 2025
Abstract
Abstract
This study examines the use of Artificial Intelligence Generated Content (AIGC) tools for assessing speech difficulty in interpreter training. 25 students were invited to interpret three materials from English into Chinese consecutively and then evaluate the difficulty levels of those speeches, while ChatGPT was provided with the transcripts and the duration of the speeches. Speech evaluations by students were compared to those made by ChatGPT within a standardized framework, the Speech Difficulty Index (SDI). Statistical analysis, specifically one-sample t-tests and one-sample Wilcoxon signed rank tests, were conducted to determine any significant differences between the assessments of students and ChatGPT. As for the total scores, the results indicate a consensus between students and ChatGPT on the difficulty of a moderately challenging speech. However, divergences were observed for the other two speeches classified as more or less difficult. Further comparison of the scores on three breakdown dimensions indicates that students’ evaluation can differ from that of ChatGPT in “Subject Matter”, while there is no significant difference in the scores of “Speed of Delivery”. As for “Density and Style,” the trend is consistent with the one shown in the total scores’ comparison. A following interview presents students’ perspectives on evaluating speech difficulty, with their subjective perceptions as standards to form judgements. Given ChatGPT’s capabilities to analyze delivery speed and minimize subjective biases, the integration of AIGC tools in educational settings is recommended. Moreover, interpreter trainers should notice the divergence and balance between the subjective perception among students and the objective evaluation of speech difficulty, to complement the ignorance of AIGC tools on subjective factors. By providing AIGC tools with reliable frameworks for speech difficulty evaluation, it could refine material selection, ensuring a better alignment with learners’ proficiency levels, thereby optimizing the educational outcomes of interpreter training. Based on the findings and limitations in this study, several promising aspects for future research are proposed.