Sharing data in small and endangered languages
Speakers of small or ‘under-resourced’ languages often first contact the world of Information Technology via the effort of field linguists. Good practices in linguistic data management include the separation of structure and content and of data and metadata formats. Primary outputs of field research (lexicon, transcripts and interlinear glossed text collections, and their associated media) need to be coded and preserved. Long-term access to these data is addressed by the establishment of archives that also act as the locus for training and advocacy for well-formed data. In this paper we discuss two such archives, one in Australia, the Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC), and the other in France, the “Archiving Project” from the LACITO/CNRS.