1887
image of Assessing the accuracy of Chinese speech-to-text tools for Chinese as foreign language learners
USD
Buy:$35.00 + Taxes

Abstract

Abstract

This article examines the effectiveness of four Chinese Speech-to-Text tools in transcribing the speech of Chinese as a Foreign Language (CFL) learners across different ACTFL proficiency levels. The results indicate notable differences in transcription accuracy. Among the CSTT tools, ChatGPT 3.5 proves to be the most accurate, followed by WeChat and Baidu IME, while iOS IME shows the lowest performance. Except for iOS IME, these tools achieve 100% accuracy at the Distinguished and Superior levels, where speech closely approximates native fluency. ChatGPT 3.5 excels from Novice to Distinguished levels but occasionally overcorrects Novice-level CFL learners’ erroneous speech. WeChat performs robustly above the Novice level, while Baidu IME is best at the Advanced level and above. Conversely, iOS IME displays significant limitations at all levels. This study offers new perspectives on “good pronunciation” and the debate over handwriting versus typing Chinese characters for CFL learners.

Loading

Article metrics loading...

/content/journals/10.1075/csl.24013.fen
2025-04-08
2025-04-25
Loading full text...

Full text loading...

References

  1. American Council on the Teaching of Foreign Languages
    American Council on the Teaching of Foreign Languages (2012) ACTFL distinguished Chinese speaking sample. The Speaking Sample. Retrieved fromhttps://www.actfl.org/uploads/files/general/Documents/chinese_simple_speaking_distinguished.mp3
    [Google Scholar]
  2. American Council on the Teaching of Foreign Languages
    American Council on the Teaching of Foreign Languages (2012) ACTFL superior Chinese speaking sample. The First Speaking Sample. Retrieved fromhttps://www.actfl.org/uploads/files/general/Documents/chinese_simple_speaking_superior1.mp3
    [Google Scholar]
  3. American Council on the Teaching of Foreign Languages
    American Council on the Teaching of Foreign Languages (2012) ACTFL advanced Chinese speaking sample. The Second Speaking Sample. Retrieved fromhttps://www.actfl.org/uploads/files/general/Documents/chinese_simple_speaking_advanced2.mp3
    [Google Scholar]
  4. American Council on the Teaching of Foreign Languages
    American Council on the Teaching of Foreign Languages (2012) ACTFL intermediate Chinese speaking sample. The First Speaking Sample. Retrieved fromhttps://www.actfl.org/uploads/files/general/Documents/chinese_simple_speaking_intermediate1.mp3
    [Google Scholar]
  5. American Council on the Teaching of Foreign Languages
    American Council on the Teaching of Foreign Languages (2012) ACTFL novice Chinese speaking sample. The First Speaking Sample. Retrieved fromhttps://www.actfl.org/uploads/files/general/Documents/chinese_simple_speaking_novice1.mp3
    [Google Scholar]
  6. American Council on the Teaching of Foreign Languages
    American Council on the Teaching of Foreign Languages (2012) Chinese (simplified characters) speaking. Retrieved fromhttps://www.actfl.org/educator-resources/actfl-proficiency-guidelines/language-specific-guidelines-2012/chinese-simplified-characters/chinese-simplified-characters-speaking”
    [Google Scholar]
  7. An, M., Yu, Z., Guo, J., Gao, S., & Xian, Y.
    (2014, May). The teaching experiment of speech recognition based on HMM. InThe 26th Chinese Control and Decision Conference (2014 CCDC) (pp.–). IEEE. 10.1109/CCDC.2014.6852578
    https://doi.org/10.1109/CCDC.2014.6852578 [Google Scholar]
  8. Coniam, D.
    (1998) Voice recognition software accuracy with second language speakers of English. System, (), –. 10.1016/S0346‑251X(98)00045‑3
    https://doi.org/10.1016/S0346-251X(98)00045-3 [Google Scholar]
  9. Deepgram
    Deepgram (2022) Benchmarking OpenAI’s Whisper model across languages. Retrieved fromhttps://deepgram.com/learn/benchmarking-openai-whisper-for-non-english-asr
  10. Evers, K., & Chen, S.
    (2020) Effects of automatic speech recognition software on pronunciation for adults with different learning styles. Journal of Educational Computing Research, (), –. 10.1177/0735633120972011
    https://doi.org/10.1177/0735633120972011 [Google Scholar]
  11. Golas, K. C.
    (1995) Computer-based English language training for the Royal Saudi Naval Forces. Journal of Interactive Instruction Development, (), –.
    [Google Scholar]
  12. Hirai, A., & Kovalyova, A.
    (2024) Speech-to-text applications’ accuracy in English language learners’ speech transcription. Language Learning & Technology, (), –. https://hdl.handle.net/10125/73555
    [Google Scholar]
  13. Hwang, W. Y., Shadiev, R., Kuo, T. C. T., & Chen, N. S.
    (2012) Effects of speech-to-text recognition application on learning performance in synchronous cyber classrooms. Journal of Educational Technology & Society, (), –.
    [Google Scholar]
  14. Kaur, J., Singh, A. & Kadyan, V.
    (2021) Automatic speech recognition system for tonal languages: State-of-the-art survey. Archives of Computational Methods in Engineering, , –. 10.1007/s11831‑020‑09414‑4
    https://doi.org/10.1007/s11831-020-09414-4 [Google Scholar]
  15. Kincaid, J.
    (2018, September5). Which automatic transcription service is the most accurate?Descript Blog. https://medium.com/descript/which-automatic-transcription-service-is-the-most-accurate-2018-2e859b23ed19
    [Google Scholar]
  16. Kuo, T. C. T., Shadiev, R., Hwang, W. Y., & Chen, N. S.
    (2012) Effects of applying STR for group learning activities on learning performance in a synchronous cyber classroom. Computers & Education, (), –. 10.1016/j.compedu.2011.07.018
    https://doi.org/10.1016/j.compedu.2011.07.018 [Google Scholar]
  17. Manes, S.
    (1997) Speech recognition: Now you’re talking!. PC World, (), –.
    [Google Scholar]
  18. McCrocklin, S.
    (2019) ASR-based dictation practice for second language pronunciation improvement. Journal of Second Language Pronunciation, (), –. 10.1075/jslp.16034.mcc
    https://doi.org/10.1075/jslp.16034.mcc [Google Scholar]
  19. Mushangwe, H.
    (2015) Using voice recognition software in learning of Chinese as a foreign language pronunciation. The Journal of Language Teaching and Learning, (), –.
    [Google Scholar]
  20. Ngo, T. T., Chen, H. H., & Lai, K. K.
    (2023) The overall effect size of using ASR in ESL/EFL pronunciation training. ReCALL. 10.1017/S0958344023000113
    https://doi.org/10.1017/S0958344023000113 [Google Scholar]
  21. Ngoc, T. P., & Khai, T. T.
    (2021) A new approach in elementary Chinese pronunciation test using AI voice recognition at HCMUE. EDULEARN21 Proceedings, –. 10.21125/edulearn.2021.1056
    https://doi.org/10.21125/edulearn.2021.1056 [Google Scholar]
  22. Noyes, J., & Starr, A.
    (1996) Use of automatic speech recognition: current and potential applications. Computing & Control Engineering Journal, (), –. 10.1049/cce:19960502
    https://doi.org/10.1049/cce:19960502 [Google Scholar]
  23. Shadiev, R., & Liu, J.
    (2023) Review of research on applications of speech recognition technology to assist language learning. ReCALL, (), –. 10.1017/S095834402200012X
    https://doi.org/10.1017/S095834402200012X [Google Scholar]
  24. Tejedor-García, C., Cardeñoso-Payo, V., & Escudero-Mancebo, D.
    (2021) Automatic speech recognition (ASR) systems applied to pronunciation assessment of L2 Spanish for Japanese speakers. Applied Sciences, (), . 10.3390/app11156695
    https://doi.org/10.3390/app11156695 [Google Scholar]
  25. Tejedor-García, C., Escudero-Mancebo, D., Cámara-Arenas, E., González-Ferreras, C., & Cardeñoso-Payo, V.
    (2020) Assessing pronunciation improvement in students of English using a controlled computer-assisted pronunciation tool. IEEE Transactions on Learning Technologies, (), –. 10.1109/TLT.2020.2980261
    https://doi.org/10.1109/TLT.2020.2980261 [Google Scholar]
  26. Thomala, L. L.
    (2024, July2). Number of active WeChat messenger accounts Q1 2014-Q1 2024. Statista. https://www.statista.com/statistics/255778/number-of-active-wechat-messenger-accounts/
    [Google Scholar]
  27. Tian, Y.
    (2020) Error tolerance of machine translation: Findings from failed teaching design. Journal of Technology & Chinese Language Teaching, ().
    [Google Scholar]
  28. Vaughn, C., Baese-Berk, M., & Idemaru, K.
    (2019) Re-examining phonetic variability in native and non-native speech. Phonetica, (), –. 10.1159/000487269
    https://doi.org/10.1159/000487269 [Google Scholar]
  29. Wu, X.
    (2023, February28). Third-party input method user scale grows rapidly, Baidu Input Method leads the industry with a 46.4% market share. 第三方输入法用户规模高位增长,百度输入法以46.4%市占率领跑行业. Xianning News Network. 咸宁新闻网. https://www.zgswcn.com/article/202302/202302281648351122.html
    [Google Scholar]
/content/journals/10.1075/csl.24013.fen
Loading
/content/journals/10.1075/csl.24013.fen
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error