The notion of sentence and other discourse units in corpus annotation
The notion of sentence – as it is defined in syntactic, semantic, graphic and prosodic terms – is not a suitable maximal unit for the prosodic and syntactic annotation of spoken corpora. Still, this notion is taken as a reference in many syntactic and prosodic annotation systems. We present here the modular approach we adopted for the annotation of the Rhapsodie corpus of spoken French, which led us to distinguish three types of elementary units operating in discourse (government units, illocutionary units, and intonational periods) and to annotate them separately. We describe the types of interactions identified among these various levels of cohesion. On this basis we propose a reappraisal of the traditional notion of sentence and we define two additional types of discourse units that we consider as the minimal and the maximal span for the notion of sentence.