Representation of speech in CorpAfroAs
This paper surveys the transcriptional aspects of CorpAfroAs, a spoken corpus of Afroasiatic languages, with a focus on the representation of phonemes, morphemes, words, and longer units. We discuss the distinction between prosodic, phonological and morphosyntactic word, as well as that between intonation unit, paratone and period. Segmentation and transcription choices are analyzed and their outcome in terms of scientific breakthroughs is presented : the comparison between phonological and morphosyntactic word allows the systematic study of sandhi and other similar phenomena, and of the syntax/phonology interface. The segmentation into prosodic units allows the study of interfaces with syntax, information structure, and discourse.