Full text loading...
-
From archive to corpus: Transcription and annotation in the creation of signed language corpora
- Source: International Journal of Corpus Linguistics, Volume 15, Issue 1, Jan 2010, p. 106 - 131
Abstract
Annotations are an important resource in corpus-based linguistic research. In fact, the most important feature of a modern signed language corpus should be that it has been annotated rather than simply transcribed. Digital multi-media annotation software can now transform language recordings into machine-readable texts using gloss-based annotations without it first being necessary to transcribe these utterances, provided that sign tokens are identified and discriminated according to type. Further annotations can subsequently be appended to these units. However, unique identifiers of sign types (or ‘ID-glosses’) can only be used if a comprehensive reference lexical database of the language already exists. In order to create a basic multi-purpose reference signed language corpus, therefore, linguists should prioritize annotation using ID-glosses above transcription. The effort expended in creating a transcription that does not facilitate the unique identification of sign types will not result in a machine-readable corpus in any meaningful sense, contrary to expectations.