Full text loading...
Abstract
Interpreting corpora serve as the descriptive foundation of research and the ‘ground truth’ against which machine interpreting technologies are evaluated. However, access to corpora remains a critical bottleneck in interpreting studies due to data collection and processing challenges and the absence of interpreting- and translation-specific corpus publication venues. In this article, we present two technical infrastructures that facilitate corpus access: a metadata schema which standardises corpus description and the Unified Interpreting Corpus (UNIC) platform for data and metadata search and publication. Guided by the internationally established FAIR (findability, accessibility, interoperability and reusability) and CARE (collective benefit, authority to control, responsibility and ethics) principles for scientific data management and stewardship, we designed the infrastructures based on a review of 125 spoken and signed language interpreting corpora, relevant international standards and community knowledge and also by using open-source technologies. Feedback obtained from interpreting students, researchers and interpreters demonstrates greater perceived usefulness of and satisfaction with UNIC compared to general-purpose search portals. Overall, we illustrate a value- and consensus-driven path towards optimising the use of interpreting corpora and the careful curation of new ones, which avoids the duplication of effort, helps to chart research directions and fosters co-design with communities.
Article metrics loading...
Full text loading...
References
Data & Media loading...