Volume 22, Issue 2
  • ISSN 1384-6655
  • E-ISSN: 1569-9811
Buy:$35.00 + Taxes


While discourse markers (DMs) and (dis)fluency have been extensively studied in the past as separate phenomena, corpus-based research combining large-scale yet fine-grained annotations of both categories has, however, never been carried out before. Integrating these two levels of analysis, while methodologically challenging, is not only innovative but also highly relevant to the investigation of spoken discourse in general and form-meaning patterns in particular. The aim of this paper is to provide corpus-based evidence of the register-sensitivity of DMs and other disfluencies (e.g. pauses, repetitions) and of their tendency to combine in recurrent clusters. These claims are supported by quantitative findings on the variation and combination of DMs with other (dis)fluency devices in DisFrEn, a richly annotated and comparable English-French corpus representative of eight different interaction settings. The analysis uncovers the prominent place of DMs within (dis)fluency and meaningful association patterns between forms and functions, in a usage-based approach to meaning-in-context.


Article metrics loading...

Loading full text...

Full text loading...


  1. Aijmer, K.
    (2013) Understanding Pragmatic Markers: A Variational Pragmatic Approach. Amsterdam/Philadelphia: John Benjamins.
    [Google Scholar]
  2. Aijmer, J. , & Simon-Vandenbergen, A. -M.
    (2011) Pragmatic markers. In J. Zienkowski , J. -O. Östman & J. Verschueren (Eds.), Discursive Pragmatics (pp.223–247). Amsterdam/Philadelphia: John Benjamins. doi: 10.1075/hoph.8.13aij
    https://doi.org/10.1075/hoph.8.13aij [Google Scholar]
  3. Beeching, K.
    (2013) A parallel corpus approach to investigating semantic change. In K. Aijmer & B. Altenberg (Eds.), Advances in Corpus-based Contrastive Linguistics. Studies in Honour of Stig Johansson (pp.103–125). Amsterdam/Philadelphia: John Benjamins. doi: 10.1075/scl.54.07bee
    https://doi.org/10.1075/scl.54.07bee [Google Scholar]
  4. Beliao, J. , & Lacheret, A.
    (2013) Disfluency and discursive markers: When prosody and syntax plan discourse. In R. Eklund (Ed.), Proceedings of Disfluency in Spontaneous Speech (DiSS) 2013. TMH-QPSR, 54(1), 5–8.
    [Google Scholar]
  5. Besser, J. , & Alexandersson, J.
    (2007) A comprehensive disfluency model for multi-party interaction. In S. Keizer , H. Bunt & T. Paek (Eds.), Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue (pp.182–189).
    [Google Scholar]
  6. Bolly, C. , & Degand, L.
    (2009) Quelle(s) fonction(s) pour “donc” en français oral? Du connecteur conséquentiel au marqueur de structuration du discours. Lingvisticae Investigationes, 32(1), 1–32. doi: 10.1075/li.32.1.01bol
    https://doi.org/10.1075/li.32.1.01bol [Google Scholar]
  7. Bolly, C. , Crible, L. , Degand, L. , & Uygur-Distexhe, D.
    (2017) Towards a model for discourse marker annotation. From potential to feature-based discourse markers. In C. Fedriani & A. Sansó (Eds.), Discourse Markers, Pragmatic Markers and Modal Particles: New Perspectives (pp.71–97). Amsterdam/Philadelphia: John Benjamins.
    [Google Scholar]
  8. Bortfeld, H. , Leon, S. , Bloom, J. , Schober, M. , & Brennan, S.
    (2001) Disfluency rates in conversation: Effects of age, relationship, topic, role and gender. Language and Speech, 44(2), 123–147. doi: 10.1177/00238309010440020101
    https://doi.org/10.1177/00238309010440020101 [Google Scholar]
  9. Boula de Mareüil, P. , Adda, G. , Adda-Decker, M. , Barras, C. , Habert, B. , & Paroubek, P.
    (2013) Une étude quantitative des marqueurs discursifs, disfluences et chevauchements de parole dans des interviews politiques. TIPA Travaux Interdisciplinaires sur la Parole et le Langage, 29.
    [Google Scholar]
  10. Bouraoui, J. -L. , & Vigouroux, N.
    (2006) Étude de dysfluences dans un corpus linguistiquement contraint. InProceedings of the Journée d’Etudes sur la Parole (JEP 2006) (pp.429–432).
    [Google Scholar]
  11. Brognaux, S. , Roekhaut, S. , Drugman, T. , & Beaufort, R.
    (2012)  Train&Align: A new online tool for automatic phonetic alignment. InProceedings of IEEE Spoken Language Technology Workshop (SLT) (pp.416–421). doi: 10.1109/SLT.2012.6424260
    https://doi.org/10.1109/SLT.2012.6424260 [Google Scholar]
  12. Candéa, M.
    (2000) Contribution à l’Etude des Pauses Silencieuses et des Phénomènes Dits “d’Hésitation” en Français Oral Spontané (Unpublished doctoral dissertation). Université Paris III, Paris.
    [Google Scholar]
  13. Crible, L.
    (2014) Identifying and Describing Discourse Markers in Spoken Corpora. Annotation Protocol v.8 (Technical report). Louvain-la-Neuve, Université catholique de Louvain.
    [Google Scholar]
  14. (2017) Towards an operational category of discourse markers: A definition and its model. In A. Sansó & C. Fedriani (Eds.), Discourse Markers, Pragmatic Markers and Modal Particles: New Perspectives (pp.99–124). Amsterdam/Philadelphia: John Benjamins.
    [Google Scholar]
  15. Crible, L. , & Degand, L.
    (forthcoming). Reliability vs. granularity in discourse annotation: What is the trade-off?Corpus Linguistics and Linguistic Theory.
    [Google Scholar]
  16. Crible, L. , Degand, L. , & Gilquin, G.
    (2017) The clustering of discourse markers and filled pauses: A corpus-based French-English study of (dis)fluency. Languages in Contrast17(1), 69–95. doi: 10.1075/lic.17.1.04cri
    https://doi.org/10.1075/lic.17.1.04cri [Google Scholar]
  17. Crible, L. , Dumont, A. , Grosman, I. , & Notarrigo, I.
    (2016) Annotation Manual of Fluency and Disfluency Markers in Multilingual, Multimodal, Native and Learner Corpora. Version 2.0 (Technical report). Louvain-la-Neuve & Namur, Université catholique de Louvain & Université de Namur.
    [Google Scholar]
  18. Degand, L. , Martin, L. , & Simon, A. -C.
    (2014) LOCAS-F: Un corpus oral multigenres annoté. Paper presented at theCongrès Mondial de Linguistique Française, Berlin, Germany.
    [Google Scholar]
  19. Demirşahin, I. , & Zeyrek, D.
    (2014) Annotating discourse connectives in spoken Turkish. In L. Levin & M. Stede (Eds.), LAW VIII – The 8th Linguistic Annotation Workshop (pp.105–109). doi: 10.3115/v1/W14‑4916
    https://doi.org/10.3115/v1/W14-4916 [Google Scholar]
  20. Denke, A.
    (2009) Nativelike Performance. Pragmatic Markers, Repair and Repetition in Native and Non-native English Speech. Saarbrücken: Verlag Dr. Müller.
    [Google Scholar]
  21. Dister, A. , Francard, M. , Hambye, P. , & Simon, A. -C.
    (2009) Du corpus à la banque de données. Du son, des textes et des métadonnées. L’évolution de la banque de données textuelles orales VALIBEL (1989–2009). Cahiers de Linguistique, 33(2), 113–129.
    [Google Scholar]
  22. Dupont, M.
    (2015) Word order in English and French: The position of English and French adverbial connectors of contrast. English Text Construction, 8(1), 88–124. doi: 10.1075/etc.8.1.04dup
    https://doi.org/10.1075/etc.8.1.04dup [Google Scholar]
  23. Ejzenberg, R.
    (2000) The juggling act of oral fluency: A psycho-sociolinguistic metaphor. In H. Riggenbach (Ed.), Perspectives on Fluency (pp.288–313). Ann Arbor: The University of Michigan Press.
    [Google Scholar]
  24. Eklund, R.
    (2004) Disfluency in Swedish Human-human and Human-machine Travel Booking Dialogues (Unpublished doctoral dissertation). Linköpings Universitet, Linköping.
    [Google Scholar]
  25. Eklund, R. , & Shriberg, E.
    (1998) Crosslinguistic disfluency modeling: A comparative analysis of Swedish and American English human-human and human-machine dialogs. In R. H. Mannell & J. Robert-Ribes (Eds.), Proceedings of the 5th International Conference on Spoken Language Processing (pp.2627–2630). Canberra: Australian Speech Science and Technicology Association, Incorporated (ASSTA).
    [Google Scholar]
  26. Gilquin, G.
    (2006) The place of prototypicality in corpus linguistics. Causation in the hot seat. In S. Gries & A. Stefanowitsch (Eds.), Corpora in Cognitive Linguistics: Corpus-based Approaches to Syntax and Lexis (pp.159–191). Berlin: Mouton de Gruyter.
    [Google Scholar]
  27. Gilquin, G. , & Gries, S.
    (2009) Corpora and experimental methods: A state-of-the-art review. Corpus Linguistics and Linguistic Theory, 5(1), 1–26. doi: 10.1515/CLLT.2009.001
    https://doi.org/10.1515/CLLT.2009.001 [Google Scholar]
  28. Goldman, J. -P. , Prsir, T. , & Auchlin, A.
    (2014) C-PhonoGenre: A 7-hour corpus of 7 speaking styles in French: Relations between situational features and prosodic properties. In N. Calzolari , K. Choukri , T. Declerck , H. Loftsson , B. Maegaard , J. Mariani , A. Moreno , J. Odijk & S. Piperidis (Eds.), Proceedings of the 9th Language Resources and Evaluation Conference (LREC’14) (pp.302–305). Paris, European Language Resources Association (ELRA).
    [Google Scholar]
  29. González, M.
    (2005) Pragmatic markers and discourse coherence relations in English and Catalan oral narrative. Discourse Studies, 77(1), 53–86. doi: 10.1177/1461445605048767
    https://doi.org/10.1177/1461445605048767 [Google Scholar]
  30. Götz, S.
    (2013) Fluency in Native and Nonnative English Speech. Amsterdam/Philadelphia: John Benjamins. doi: 10.1075/scl.53
    https://doi.org/10.1075/scl.53 [Google Scholar]
  31. Grosjean, F. , & Deschamps, A.
    (1975) Analyse contrastive des variables temporelles de l’anglais et du français: Vitesse de parole et variables composantes, phénomènes d’hésitation. Phonetica, 31(3–4), 144–184. doi: 10.1159/000259667
    https://doi.org/10.1159/000259667 [Google Scholar]
  32. Grosman, I.
    (2016) How do French humorists manage their persona across situations? A corpus study on their prosodic variation. In L. Ruiz-Gurillo (Ed.), Metapragmatics of Humor: Current Research Trends (pp.147–175). Amsterdam: John Benjamins. doi: 10.1075/ivitra.14.08gro
    https://doi.org/10.1075/ivitra.14.08gro [Google Scholar]
  33. Hansen, M. -B. M.
    (2006) A dynamic polysemy approach to the lexical semantics of discourse markers (with an exemplary analysis of French toujours). In K. Fischer (Ed.), Approaches to Discourse Particles (pp.21–41). Amsterdam: Elsevier.
    [Google Scholar]
  34. Hasselgren, A.
    (2002) Learner corpora and language testing: Small words as markers of learner fluency. In S. Granger , J. Hung & S. Petch-Tyson (Eds.), Computer-Learner Corpora, Second Language Acquisition, and Foreign Language Teaching (pp.143–173). Amsterdam/Philadelphia: John Benjamins. doi: 10.1075/lllt.6.11has
    https://doi.org/10.1075/lllt.6.11has [Google Scholar]
  35. Kemmer, S. , & Barlow, M.
    (2000) Introduction: A usage-based conception of language. In M. Barlow & S. Kemmer (Eds.), Usage Based Models of Language (pp.vii–xxviii). Stanford: CSLI.
    [Google Scholar]
  36. Kohn, K.
    (2012) Pedagogic corpora for content and language integrated learning. Insights from the BACKBONE project. The Eurocall Review, 20(2), 1–22.
    [Google Scholar]
  37. Kunz, K. , & Lapshinova-Koltunski, E.
    (2015) Cross-linguistic analysis of discourse variation across registers. Nordic Journal of English Studies, 14(1), 258–288.
    [Google Scholar]
  38. Lacheret, A. , Kahane, S. , & Pietrandrea, P.
    (Eds.) (2014) Rhapsodie: A Prosodic and Syntactic Treebank for Spoken French. Amsterdam/Philadelphia: John Benjamins.
    [Google Scholar]
  39. Lopes, A. , Martins de Matos, D. , Cabarrão, V. , Ribeiro, R. , Moniz, H. , Trancoso, I. , & Mata, A. I.
    (2015) Towards using machine translation techniques to induce multilingual lexica of discourse markers. Computing Research Repository (CoRR), 1–6https://arxiv.org/pdf/1503.09144.pdf (last accessedAugust 2017).
    [Google Scholar]
  40. Meteer, M. Taylor, A. , MacIntyre, R. , & Iver, R.
    (1995) Disfluency Annotation Stylebook for the Switchboard Corpus (Technical report). Linguistic Data Consortium. Philadelphia, PA, University of Pennsylvania.
    [Google Scholar]
  41. Müller, S.
    (2005) Discourse Markers in Native and Non-native English Discourse. Amsterdam/Philadelphia: John Benjamins. doi: 10.1075/pbns.138
    https://doi.org/10.1075/pbns.138 [Google Scholar]
  42. Nelson, G. , Wallis, S. , & Aarts, B.
    (2002) Exploring Natural Language: Working with the British Component of the International Corpus of English. Amsterdam/Philadelphia: John Benjamins. doi: 10.1075/veaw.g29
    https://doi.org/10.1075/veaw.g29 [Google Scholar]
  43. Palisse, S.
    (1997) “Artisans”, “Assureurs”, Conversations Téléphoniques en Entreprise. Retrieved fromclapi-univ.lyon2.fr (last accessedMarch 2014).
    [Google Scholar]
  44. Pallaud, B. , Rauzy, S. , & Blâche, P.
    (2013) Auto-interruptions et disfluences en français parlé dans quatre corpus du CID. TIPA Travaux Interdisciplinaires sur la Parole et le Langage, 29, 2–19.
    [Google Scholar]
  45. Pawley, A. , & Syder, F.
    (2000) The one-clause-at-a-time hypothesis. In H. Riggebbach (Ed.), Perspectives on Fluency (pp.163–199). Ann Arbor: The University of Michigan Press.
    [Google Scholar]
  46. Prasad, R. , Dinesh, N. , Lee, A. , Miltsakaki, E. , Robaldo, L. , Joshi, A. , & Webber, B.
    (2008) The Penn Discourse TreeBank 2.0. In N. Calzolari , K. Choukri , B. Maegaard , J. Mariani , J. Odijk , S. Piperidis & D. Tapias (Eds.), Proceedings of the 6th Language Resources and Evaluation Conference (LREC’08) (pp.2961–2968). Paris, European Language Resources Association (ELRA).
    [Google Scholar]
  47. Roekhaut, S. , Brognaux, S. , Beaufort, R. , & Dutoit, T.
    (2014) eLite-HTS: Un outil TAL pour la génération de synthèse HMM en français. Paper presented at theJournées d’Etude de la Parole (JEP), Le Mans, France.
    [Google Scholar]
  48. Rühlemann, C. , & O’Donnell, M.
    (2012) Introducing a corpus of conversational stories. Construction and annotation of the Narrative Corpus . Corpus Linguistics and Linguistic Theory, 8(2), 313–350. doi: 10.1515/cllt‑2012‑0015
    https://doi.org/10.1515/cllt-2012-0015 [Google Scholar]
  49. Schegloff, E. , Jefferson, G. , & Sacks, H.
    (1977) The preference for self-correction in the organization of repair in conversation. Language, 53(2), 361–382. doi: 10.1353/lan.1977.0041
    https://doi.org/10.1353/lan.1977.0041 [Google Scholar]
  50. Schiffrin, D.
    (1987) Discourse Markers. Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511611841
    https://doi.org/10.1017/CBO9780511611841 [Google Scholar]
  51. Schmid, H.
    (1997) Probabilistic part-of-speech tagging using decision trees. In D. Jones & H. Somers (Eds.), New Methods in Language Processing (pp.154–164). London: UCL Press.
    [Google Scholar]
  52. Schmid, H. -J.
    (2010) Does frequency in text instantiate entrenchment in the cognitive system. In D. Glynn & K. Fischer (Eds.), Quantitative Methods in Cognitive Semantics: Corpus-Driven Approaches (pp.101–133). Berlin: Mouton de Gruyter. doi: 10.1515/9783110226423.101
    https://doi.org/10.1515/9783110226423.101 [Google Scholar]
  53. Schmidt, T. , & Wörner, K.
    (2009) EXMARaLDA – Creating, analysing and sharing spoken language corpora for pragmatic research. Pragmatics, 19(4), 565–582. doi: 10.1075/prag.19.4.06sch
    https://doi.org/10.1075/prag.19.4.06sch [Google Scholar]
  54. Schourup, L.
    (1999) Discourse markers. Lingua, 107, 227–265. doi: 10.1016/S0024‑3841(96)90026‑1
    https://doi.org/10.1016/S0024-3841(96)90026-1 [Google Scholar]
  55. Shriberg, E.
    (1994) Preliminaries to a Theory of Speech Disfluencies (Unpublished doctoral dissertation). University of California, Berkeley, CA.
    [Google Scholar]
  56. Simon, A. -C. , Auchlin, A. , Avanzi, M. , & Goldman, J.-Ph.
    (2010) Les phonostyles. Une description prosodique des styles de parole en français. In M. Abecassis & G. Ledegen (Eds.), Les Voix des Français. En Parlant, en Ecrivant, vol.2 (pp.71–88). Bern: Peter Lang.
    [Google Scholar]
  57. Strassel, S.
    (2003) Simple Metadata Annotation Specification v.5 (Technical report). Linguistic Data Consortium. Philadelphia, PA, University of Pennsylvania.
    [Google Scholar]
  58. Tonelli, S. , Riccardi, G. , Prasad, R. , & Joshi, A.
    (2010) Annotation of discourse relations for conversational spoken dialogs. In N. Calzolari , K. Choukri , B. Maegaard , J. Mariani , J. Odijk , S. Piperidis , M. Rosner & D. Tapias (Eds.), Proceedings of the 7th Language Resources and Evaluation Conference (LREC’10) (pp.2084–2090). Paris, European Language Resources Association (ELRA).
    [Google Scholar]
  59. Willems, D. , & Demol, A.
    (2006)  Vraiment and really in contrast: When truth and reality meet. In K. Aijmer & A. -M. Simon-Vandenbergen (Eds.), Pragmatic Markers in Contrast (pp.215–235). Amsterdam: Elsevier.
    [Google Scholar]
  60. Zikánová, Š. , Hajičová, E. , Hladká, B. , Jínová, P. , Mírovský, J. , Nedoluzhko, A. , Poláková, L. , Rysová, K. , Rysová, M. , & Václ, J.
    (2015) Discourse and Coherence. From the Sentence Structure to Relations in Text. Prague: Institute of Formal and Applied Linguistics.
    [Google Scholar]
  61. Zufferey, S. , & Cartoni, B.
    (2012) English and French causal connectives in contrast. Languages in Contrast, 12(2), 232–250. doi: 10.1075/lic.12.2.06zuf
    https://doi.org/10.1075/lic.12.2.06zuf [Google Scholar]
  62. Zufferey, S. , & Degand, L.
    (2013). Annotating the meaning of discourse connectives in multilingual corpora. Corpus Linguistics and Linguistic Theory.
    [Google Scholar]

Data & Media loading...

  • Article Type: Research Article
Keyword(s): corpus annotation; discourse markers; disfluency; speech; usage-based
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error