Methodological issues for spontaneous speech corpora compilation
Spontaneous Speech Corpus Compilation has been going through a growing period in the past 20 years. This is due majorly to technological advances that have been achieved allowing for highly accurate recording in vivo, new insights coming from empirically-based linguistic theory, concerns for the documentation of threatened languages and the high degree of relevance of findings to speech recognition applications. This paper discusses methodologies associated to spontaneous speech corpus compilation which shed light on specific aspects of relevance to the understanding of linguistic phenomena that pertain to spoken language. The compilation process of C-ORAL-BRASIL I, an informal spontaneous speech Brazilian Portuguese corpus, among other examples, is used as the basis for the discussion carried.