1887
image of Automatic discourse segmentation of L1 and L2 spoken English transcripts

Abstract

Abstract

Natural language processing (NLP) tools, primarily trained on L1 written English, have achieved remarkable performance, but are rarely used in L2 learner data. This study leverages a rule-based segmenter to automatically segment spoken English discourse by both L1 speakers and learners, presenting novel preparatory data-cleaning steps that combine a state-of-the-art disfluency detector and additional rules to improve segmentation performance. In three successive segmentation tests on data from the (LOCNEC; De Cock, 2004) and the (LINDSEI; Gilquin et al. 2010), we achieve an enhanced segmentation performance that is similar for both the L1 and L2 data (.84). Our approach highlights the effectiveness of leveraging existing NLP tools to process disfluent L2 spoken transcripts, facilitating automatic discourse analysis in Learner Corpus Research (LCR). The code for executing our pipeline is publicly available for future research.

Available under the CC BY 4.0 license.
Loading

Article metrics loading...

/content/journals/10.1075/ijlcr.24023.yan
2025-10-07
2025-11-13
Loading full text...

Full text loading...

/deliver/fulltext/10.1075/ijlcr.24023.yan/ijlcr.24023.yan.html?itemId=/content/journals/10.1075/ijlcr.24023.yan&mimeType=html&fmt=ahah

References

  1. Bach, N., & Huang, F.
    (2019) Noisy BiLSTM-based models for disfluency detection. Proceedings of Interspeech 2019, –. 10.21437/Interspeech.2019‑1336
    https://doi.org/10.21437/Interspeech.2019-1336 [Google Scholar]
  2. Bhat, S., & Yoon, S. Y.
    (2015) Automatic assessment of syntactic complexity for spontaneous speech scoring. Speech Communication, , –. 10.1016/j.specom.2014.09.005
    https://doi.org/10.1016/j.specom.2014.09.005 [Google Scholar]
  3. Biber, D., Gray, B., & Staples, S.
    (2016) Predicting patterns of grammatical complexity across language exam task types and proficiency levels. Applied Linguistics, (), –. 10.1093/applin/amu059
    https://doi.org/10.1093/applin/amu059 [Google Scholar]
  4. Caines, A., & Buttery, P.
    (2014) The effect of disfluencies and learner errors on the parsing of spoken learner language. InY. Goldberg, Y. Marton, I. Rehbein, Y. Versley, Ö. Çetinoğlu, & J. Tetreault (Eds.), Proceedings of the first joint workshop on statistical parsing of morphologically rich languages and syntactic analysis of non-canonical languages (pp.–). Dublin City University. Retrieved fromhttps://aclanthology.org/W14-6107.pdf
    [Google Scholar]
  5. Carlson, L., Okurowski, M. E., & Marcu, D.
    (2002) RST discourse treebank. Linguistic Data Consortium.
    [Google Scholar]
  6. Chambers, L., & Ingham, K.
    (2011) The BULATS online speaking test. Research Notes, , –. Retrieved fromwww.cambridgeenglish.org/images/23161-researchnotes-43.pdf
    [Google Scholar]
  7. Charniak, E., & Johnson, M.
    (2001) Edit detection and parsing for transcribed speech. Second Meeting of the North American Chapter of the Association for Computational Linguistics. NAACL 2001. Retrieved fromhttps://aclanthology.org/N01-1016.pdf
  8. Chen, M., & Zechner, K.
    (2011) Computing and evaluating syntactic complexity features for automated scoring of spontaneous non-native speech. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (pp.–). Association for Computational Linguistics.
    [Google Scholar]
  9. Cieri, C., Graff, D., Kimball, O., Miller, D., & Walker, K.
    (2004) Fisher English training speech part 1 transcripts LDC2004T19. Linguistic Data Consortium.
    [Google Scholar]
  10. (2005) Fisher English training speech part 2 transcripts LDC2005T19. Linguistic Data Consortium.
    [Google Scholar]
  11. Cresti, E.
    (1995) Speech act units and informational units. InE. Fava (Ed.), Speech acts and linguistic research. (pp.–). Proceedings of the Workshop, Center for Cognitive Science of New York at Buffalo
    [Google Scholar]
  12. De Cock, S.
    (2004) Preferred sequences of words in NS and NNS speech. Belgian Journal of English Language and Literatures (BELL), New Series, , –.
    [Google Scholar]
  13. Dong, Q., Wang, F., Yang, Z., Chen, W., Xu, S., & Xu, B.
    (2019) Adapting translation models for transcript disfluency detection. Proceedings of the AAAI Conference on Artificial Intelligence, (), –. 10.1609/aaai.v33i01.33016351
    https://doi.org/10.1609/aaai.v33i01.33016351 [Google Scholar]
  14. Feng, V. W., & Hirst, G.
    (2014) Two-pass discourse segmentation with pairing and global features. CoRR, abs/1407.8215. Retrieved fromhttps://arxiv.org/abs/1407.8215
    [Google Scholar]
  15. Foster, P., Tonkyn, A., & Wigglesworth, G.
    (2000) Measuring spoken language: A unit for all reasons. Applied linguistics, (), –. 10.1093/applin/21.3.354
    https://doi.org/10.1093/applin/21.3.354 [Google Scholar]
  16. Gilquin, G., De Cock, S., & Granger, S.
    (2010) The Louvain International Database of Spoken English Interlanguage: Handbook and CD-ROM. Presses universitaires de Louvain.
    [Google Scholar]
  17. Godfrey, J. J., Holliman, E. C., & McDaniel, J.
    (1992) SWITCHBOARD: Telephone speech corpus for research and development. Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-92) (Vol., pp.–). IEEE. 10.1109/ICASSP.1992.225858
    https://doi.org/10.1109/ICASSP.1992.225858 [Google Scholar]
  18. Guhr, O., Schumann, A.-K., Bahrmann, F., & Böhme, H. J.
    (2021) FullStop: Multilingual Deep Models for Punctuation Prediction. Proceedings of the Swiss Text Analytics Conference 2021. CEUR Workshop Proceedings. Retrievedfromceur-ws.org/Vol-2957/sepp_paper4.pdf
    [Google Scholar]
  19. Himmelmann, N. P.
    (2006) The challenges of segmenting spoken language. InJ. Gippert, N. P. Himmelmann, & U. Mosel (Eds.), Essentials of language documentation (pp.–). Mouton De Gruyter. 10.1515/9783110197730.253
    https://doi.org/10.1515/9783110197730.253 [Google Scholar]
  20. Hirschberg, J., & Litman, D.
    (1993) Empirical studies on the disambiguation of cue phrases. Computational Linguistics, (), –.
    [Google Scholar]
  21. Hoek, J., Evers-Vermeul, J., & Sanders, T. J. M.
    (2018) Segmenting discourse: Incorporating interpretation into segmentation?Corpus Linguistics and Linguistic Theory, (), –. 10.1515/cllt‑2016‑0042
    https://doi.org/10.1515/cllt-2016-0042 [Google Scholar]
  22. Honnibal, M., & Johnson, M.
    (2014) Joint incremental disfluency detection and dependency parsing. Transactions of the Association for Computational Linguistics, , –. 10.1162/tacl_a_00171
    https://doi.org/10.1162/tacl_a_00171 [Google Scholar]
  23. Hough, J., & Schlangen, D.
    (2015) Recurrent neural networks for incremental disfluency detection. Proceedings of Interspeech 2015, –. 10.21437/Interspeech.2015‑264
    https://doi.org/10.21437/Interspeech.2015-264 [Google Scholar]
  24. Izumi, E., Uchimoto, K., & Isahara, H.
    (2004) The NICT JLE Corpus Exploiting the language learners’ speech database for research and education. The International Journal of the Computer, the Internet and Management, , –.
    [Google Scholar]
  25. Johnson, M., & Charniak, E.
    (2004) A TAG-based noisy channel model of speech repairs. Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (pp.–). Association for Computational Linguistics. 10.3115/1218955.1218960
    https://doi.org/10.3115/1218955.1218960 [Google Scholar]
  26. Joty, S., Carenini, G., & Ng, R. T.
    (2015) Codra: A novel discriminative framework for rhetorical analysis. Computational Linguistics, (), –. 10.1162/COLI_a_00226
    https://doi.org/10.1162/COLI_a_00226 [Google Scholar]
  27. Kahane, S., Caron, B., Strickland, E., & Gerdes, K.
    (2021) Annotation guidelines of UD and SUD treebanks for spoken corpora: a proposal. InD. Dakota, K. Evang, & S. Kübler (Eds.), Proceedings of the 20th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2021) (pp.–). Association for Computational Linguistics.
    [Google Scholar]
  28. Knill, K. M., Gales, M. J., Manakul, P. P., & Caines, A. P.
    (2019) Automatic grammatical error detection of non-native spoken learner English. ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp.–). IEEE. 10.1109/ICASSP.2019.8683080
    https://doi.org/10.1109/ICASSP.2019.8683080 [Google Scholar]
  29. Kyle, K., Eguchi, M., Miller, A., & Sither, T.
    (2022) A dependency treebank of spoken second language English. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022) (pp.–). Association for Computational Linguistics. 10.18653/v1/2022.bea‑1.7
    https://doi.org/10.18653/v1/2022.bea-1.7 [Google Scholar]
  30. Kyle, K., & Eguchi, M.
    (2024) Evaluating NLP models with written and spoken L2 samples. Research Methods in Applied Linguistics, (), . 10.1016/j.rmal.2024.100120
    https://doi.org/10.1016/j.rmal.2024.100120 [Google Scholar]
  31. Le Thanh, H., Abeysinghe, G., & Huyck, C.
    (2004) Automated discourse segmentation by syntactic information and cue phrases. Proceedings of the IASTED International Conference on Artificial Intelligence and Applications (AIA 2004), Innsbruck, Austria, (pp.–). IASTED.
    [Google Scholar]
  32. Lou, P. J., & Johnson, M.
    (2017) Disfluency detection using a noisy channel model and a deep neural language model. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Volume 2:(Short Papers), (pp.–). Association for Computational Linguistics. 10.18653/v1/P17‑2087
    https://doi.org/10.18653/v1/P17-2087 [Google Scholar]
  33. (2020) Improving disfluency detection by self-training a self-attentive model. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp.–). Association for Computational Linguistics. 10.18653/v1/2020.acl‑main.346
    https://doi.org/10.18653/v1/2020.acl-main.346 [Google Scholar]
  34. Lu, Y., Gales, M. J. F., Knill, K. M., Manakul, P., & Wang, Y.
    (2019) Disfluency detection for spoken learner English. Proceedings of the 8th ISCA Workshop on Speech and Language Technology in Education (SLaTE 2019), (pp.–). 10.21437/SLaTE.2019‑14
    https://doi.org/10.21437/SLaTE.2019-14 [Google Scholar]
  35. Lu, Y., Gales, M. J. F., & Wang, Y.
    (2020) Spoken language ‘grammatical error correction.’ Proceedings of Interspeech 2020, (pp.–). 10.21437/Interspeech.2020‑1852
    https://doi.org/10.21437/Interspeech.2020-1852 [Google Scholar]
  36. Mann, W., & Thompson, S.
    (1988) Rhetorical Structure Theory: Toward a functional theory of text organization. Text — Interdisciplinary Journal for the Study of Discourse, (), –. 10.1515/text.1.1988.8.3.243
    https://doi.org/10.1515/text.1.1988.8.3.243 [Google Scholar]
  37. Meurers, D.
    (2015) Learner corpora and natural language processing. InS. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp.–). Cambridge University Press. 10.1017/CBO9781139649414.024
    https://doi.org/10.1017/CBO9781139649414.024 [Google Scholar]
  38. Moore, R., Caines, A., Graham, C., & Buttery, P.
    (2015) Incremental dependency parsing and disfluency detection in spoken learner English. InP. Král & V. Matoušek (Eds.), Text, Speech, and Dialogue: TSD 2015 (Vol., pp.–). Springer. 10.1007/978‑3‑319‑24033‑6_53
    https://doi.org/10.1007/978-3-319-24033-6_53 [Google Scholar]
  39. Oberländer, L., & Klinger, R.
    (2020) Token sequence labelling vs. clause classification for English emotion stimulus detection. Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics (pp.–). Association for Computational Linguistics.
    [Google Scholar]
  40. Ostendorf, M., & Hahn, S.
    (2013) A sequential repetition model for improved disfluency detection. Proceedings of Interspeech 2013, –. 10.21437/Interspeech.2013‑604
    https://doi.org/10.21437/Interspeech.2013-604 [Google Scholar]
  41. Passonneau, R. J., & Litman, D.
    (1997) Discourse segmentation by human and automated means. Computational Linguistics, (), –.
    [Google Scholar]
  42. Pietrandrea, P., Kahane, S., Lacheret, A., & Sabio, F.
    (2014) The notion of sentence and other discourse units in corpus annotation. InT. Raso & H. Mello (Eds.), Spoken corpora and linguistic studies (pp.–). John Benjamins. 10.1075/scl.61.12pie
    https://doi.org/10.1075/scl.61.12pie [Google Scholar]
  43. Polanyi, L.
    (1988) A formal model of the structure of discourse. Journal of Pragmatics, (),–. 10.1016/0378‑2166(88)90050‑1
    https://doi.org/10.1016/0378-2166(88)90050-1 [Google Scholar]
  44. Qian, X., & Liu, Y.
    (2013) Disfluency detection using multi-step stacked learning. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp.–). NAACL.
    [Google Scholar]
  45. Rocholl, J., Zayats, V., Walker, D., Murad, N., Schneider, A., & Liebling, D.
    (2021) Disfluency detection with unlabeled data and small BERT models. Proceedings of Interspeech 2021, –. 10.21437/Interspeech.2021‑351
    https://doi.org/10.21437/Interspeech.2021-351 [Google Scholar]
  46. Römer, U., Roberson, A., O’Donnell, M. B., & Ellis, N. C.
    (2014) Linking learner corpus and experimental data in studying second language learners’ knowledge of verb-argument constructions. ICAME Journal, (), –. 10.2478/icame‑2014‑0006
    https://doi.org/10.2478/icame-2014-0006 [Google Scholar]
  47. Sacks, H., & Schegloff, E. A., & Jefferson, G.
    (1974) A simplest systematics for the organization of turn-taking for conversation. Language, (), –. 10.1353/lan.1974.0010
    https://doi.org/10.1353/lan.1974.0010 [Google Scholar]
  48. Sanders, T., & Wijk, C.
    (1996) PISA — A procedure for analyzing the structure of explanatory texts. Text & Talk, (), –. 10.1515/text.1.1996.16.1.91
    https://doi.org/10.1515/text.1.1996.16.1.91 [Google Scholar]
  49. Schilperoord, J., & Verhagen, A.
    (1998) Conceptual dependency and the clausal structure of discourse. InJ. Koenig (Ed.), Discourse and cognition: bridging the gap (pp.–). CSLI Publications.
    [Google Scholar]
  50. Shriberg, E. E.
    (1994) Preliminaries to a theory of speech disfluencies [Unpublished Doctoral dissertation). University of California at Berkley
  51. Skidmore, L.
    (2022) Incremental disfluency detection for spoken learner English (Doctoral dissertation). University of Sheffield. 10.18653/v1/2022.bea‑1.31
    https://doi.org/10.18653/v1/2022.bea-1.31
  52. Skidmore, L., & Moore, R.
    (2022) Incremental disfluency detection for spoken learner English. InProceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022) (pp.–). Association for Computational Linguistics. 10.18653/v1/2022.bea‑1.27
    https://doi.org/10.18653/v1/2022.bea-1.27 [Google Scholar]
  53. Soricut, R., & Marcu, D.
    (2003) Sentence level discourse parsing using syntactic and lexical information. Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (pp.–). NAACL2003. 10.3115/1073445.1073475
    https://doi.org/10.3115/1073445.1073475 [Google Scholar]
  54. Stede, M.
    (2012) Small discourse units and coherence relations. InHirst, G. (Ed.), Discourse processing (pp.–). Springer International Publishing. 10.1007/978‑3‑031‑02144‑2_4
    https://doi.org/10.1007/978-3-031-02144-2_4 [Google Scholar]
  55. (2020) Automatic argumentation mining and the role of stance and sentiment. Journal of Argumentation in Context, (), –. 10.1075/jaic.00006.ste
    https://doi.org/10.1075/jaic.00006.ste [Google Scholar]
  56. Subba, R., & Di Eugenio, B.
    (2007) Automatic discourse segmentation using neural networks. Proceedings of the 11th Workshop on the Semantics and Pragmatics of Dialogue (pp.–). SEMDIAL.
    [Google Scholar]
  57. Tofiloski, M., Brooke, J., & Taboada, M.
    (2009) A syntactic and lexical-based discourse segmenter. InK.-Y. Su, J. Su, J. Wiebe, & H. Li (Eds.), Proceedings of the ACL-IJCNLP 2009 Conference Short Papers (pp.–). Association for Computational Linguistics. 10.3115/1667583.1667609
    https://doi.org/10.3115/1667583.1667609 [Google Scholar]
  58. Van Enschot, R., Spooren, W., van den Bosch, A., Burgers, C., Degand, L., Evers-Vermeul, J., … & Maes, A.
    (2024) Taming our wild data: On intercoder reliability in discourse research. Dutch Journal of Applied Linguistics, , –. 10.51751/dujal16248
    https://doi.org/10.51751/dujal16248 [Google Scholar]
  59. Van Hest, E., Poulisse, N., & Bongaerts, T.
    (1997) Self-repair in L1 and L2 production: an overview. International Journal of Applied Linguistics, (), –. 10.1075/itl.117‑118.05van
    https://doi.org/10.1075/itl.117-118.05van [Google Scholar]
  60. Wang, Y., Li, S., & Yang, J.
    (2018) Toward fast and accurate neural discourse segmentation. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp.–). Association for Computational Linguistics. 10.18653/v1/D18‑1116
    https://doi.org/10.18653/v1/D18-1116 [Google Scholar]
  61. Wierszycka, J.
    (2013) Phrasal verbs in learner English: a semantic approach. A study based on a POS-tagged spoken corpus of learner English. Research in Corpus Linguistics, , –. 10.32714/ricl.01.07
    https://doi.org/10.32714/ricl.01.07 [Google Scholar]
  62. Wu, S., Zhang, D., Zhou, M., & Zhao, T.
    (2015) Efficient disfluency detection with transition-based parsing. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Volume 1, (Long Papers), (pp.–). Association for Computational Linguistics. 10.3115/v1/P15‑1048
    https://doi.org/10.3115/v1/P15-1048 [Google Scholar]
  63. Yu, J., Zhang, L., Wu, S., & Zhang, B.
    (2017) Rhythm and disfluency: Interactions in Chinese L2 English speech. 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA), –. 10.1109/ICSDA.2017.8384459
    https://doi.org/10.1109/ICSDA.2017.8384459 [Google Scholar]
  64. Zayats, V., Ostendorf, M., & Hajishirzi, H.
    (2016) Disfluency detection using a bidirectional LSTM. Proceedings of Interspeech, –. 10.21437/Interspeech.2016‑1247
    https://doi.org/10.21437/Interspeech.2016-1247 [Google Scholar]
  65. Zirn, C., Niepert, M., Stuckenschmidt, H., & Strube, M.
    (2011) Fine-grained sentiment analysis with structural features. Proceedings of 5th International Joint Conference on Natural Language Processing, (–). Asian Federation of Natural Language Processing.
    [Google Scholar]
  66. Zwarts, S., & Johnson, M.
    (2011) The impact of language models and loss functions on repair disfluency detection. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (–). Association for Computational Linguistics.
    [Google Scholar]
/content/journals/10.1075/ijlcr.24023.yan
Loading
/content/journals/10.1075/ijlcr.24023.yan
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error