Contrastive corpus annotation in the CONTRANOT project
In this paper we outline a number of issues and problems which arise duringthe process of contrastive human-coded corpus annotation of certain semanticand discourse categories within the framework of the CONTRANOT project,aimed at the creation and validation of contrastive functional descriptionsthrough corpus analysis and annotation. Human-coded corpus annotation is apreliminary step for the training of computer algorithms which allow the automationof the annotation of large corpora, but it can also serve as a mechanismfor testing aspects of linguistic theories empirically, such as theory formationand theory-redefinition, as well as enriching theories with quantitative information.The work reported in this paper focuses on the annotation of the categoryof Thematisation, on the one hand, and on Modality, on the other, to illustratethe challenges researchers have to face when confronted with the task of developingwell-designed and reliable annotation procedures for complex linguisticphenomena in a contrastive manner. We describe the annotation tasks andprocedures developed so far, which include the design of annotation schemason the basis of available linguistic theories and the testing of their reliabilitythrough agreement studies. We also evaluate and discuss the results of the annotationson the basis of their relevance for the theoretical characterisation of theinvestigated phenomena. We expect that our work will have an impact in thearea of contrastive textual analysis, and that it will pave the way for the developmentof automated annotation systems for computational applications.