- Home
- e-Journals
- International Journal of Corpus Linguistics
- Previous Issues
- Volume 21, Issue, 2016
International Journal of Corpus Linguistics - Volume 21, Issue 3, 2016
Volume 21, Issue 3, 2016
-
The Pragmatic Annotation Scheme of the SPICE-Ireland Corpus
Author(s): John M. Kirkpp.: 299–322 (24)More LessThis paper builds on the paradox whereby transcriptions are the object of study for investigations into the spoken language but yet omit so much of “what is heard” when an utterance is made. Transcriptions are agnostic with regard to Searlean notions of illocutionary force and perlocutionary effect. The paper proposes that the enhancement of transcriptions with pragmatic and prosodic annotation overcomes that paradox and captures the original utterance more objectively. It argues that annotation is part of transcription. It presents with examples a brief summary of each part of the Pragmatic Annotation Scheme developed for the SPICE-Ireland Corpus: speech acts, tone movements, discourse markers, utterance tags and quotatives.
-
Semi-lexical features in corpus transcription
Author(s): Gisle Andersenpp.: 323–347 (25)More LessAn aspect of corpus compilation that poses a particular challenge is the question of how to transcribe orthographically units that are not part of any standardised vocabulary. Among the problematic categories we find voiced pauses, minimal response signals, interjections, certain discourse markers, phonologically reduced forms, colloquialisms and dialect forms. Such semi-lexical features are usually represented by regular phonemic-graphemic correspondences but are nevertheless often inconsistently handled. This paper reviews a number of existing transcription guidelines and assesses whether the recommendations they provide are sufficient and detailed enough to secure a consistent transcription of the categories mentioned. Further, the paper assesses to what extent transcription of semi-lexical features is consistent within and across two spoken corpora. On the basis of a cross-corpus comparison of the Bergen Corpus of London Teenage Language (COLT) and the London English Corpus (LEC), the paper provides specific recommendations for corpus transcription.
-
Compiling computer-mediated spoken language corpora
Author(s): Stefan Diemer, Marie-Louise Brunner and Selina Schmidtpp.: 348–371 (24)More LessThis paper discusses key issues in the compilation of spoken language corpora in a computer-mediated communication (CMC) environment, using data from the Corpus of Academic Spoken English (CASE), a corpus of Skype conversations currently being compiled at Saarland University, Germany, in cooperation with European and US partners. Based on first findings, Skype is presented as a suitable tool for collecting informal spoken data. In addition, new recommendations concerning data compilation and transcription are put forward to supplement existing best practice as presented in Wynne (2005). We recommend the preservation of multimodal features during anonymisation, and the addition of annotation elements already at the transcription stage, particularly CMC-related discourse features, English as a Lingua Franca (ELF) features (e.g. non-standard language and code-switching), as well as the inclusion of prosodic, paralinguistic, and non-verbal annotation. Additionally, we propose a layered corpus design in order to allow researchers to focus on specific annotation features.
-
Accounting for ELF
Author(s): Ruth Osimk-Teasdale and Nora Dornpp.: 372–395 (24)More LessThis paper reports on some issues encountered when using various ‘external points of reference’ in the development of POS-tagging guidelines for the Vienna-Oxford International Corpus of English (VOICE). VOICE is a corpus of spoken English as a Lingua Franca (ELF) containing naturally occurring, plurilingual data. As in all kinds of natural language use, speakers recorded in VOICE exploit available linguistic resources, often resulting in non-codified language use and language which is difficult to classify unambiguously. However, detailed tagging solutions for such phenomena are rarely reported. We discuss usefulness and limitations of external points of reference with regard to their suitability for POS-tagging VOICE and address methodological as well as practical issues, especially the handling of non-codified language use and different types of ambiguities. We suggest that the solutions found, and the theoretical approach adopted, could be relevant for the tagging of other spoken corpora.
-
Good practices in the compilation of FOLK, the Research and Teaching Corpus of Spoken German
Author(s): Thomas Schmidtpp.: 396–418 (23)More LessThis paper presents practices in the compilation of FOLK, the Research and Teaching Corpus of Spoken German, a large collection of spontaneous verbal interaction from diverse discourse domains. After introducing the aims and organisational circumstances of the construction of FOLK, the general idea discussed is that good practices cannot be developed without considering methodological, technological and organisational aspects on equal footing. Starting from this idea, this paper inspects more closely some actual practices in FOLK, namely the handling of legal (especially privacy protection) issues, the decisions taken for the transcription and annotation workflow, and the question of how to best disseminate a corpus like FOLK. The final section sketches some possible future improvements for practices in FOLK.
-
Flexible multi-layer spoken dialogue corpora
Author(s): Simon Sauer and Anke Lüdelingpp.: 419–438 (20)More LessThis paper describes the construction of deeply annotated spoken dialogue corpora. To ensure a maximum of flexibility — in the degree of normalization, the types and formats of annotations, the possibilities for modifying and extending the corpus, or the use for research questions not originally anticipated — we propose a flexible multi-layer standoff architecture. We also take a closer look at the interoperability of tools and formats compatible with such an architecture. Free access to the corpus data through corpus queries, visualizations, and downloads — including documentation, metadata, and the original recordings — enables transparency, verifiability, and reproducibility of every step of interpretation throughout corpus construction and of any research findings obtained from this data.
Volumes & issues
-
Volume 29 (2024)
-
Volume 28 (2023)
-
Volume 27 (2022)
-
Volume 26 (2021)
-
Volume 25 (2020)
-
Volume 24 (2019)
-
Volume 23 (2018)
-
Volume 22 (2017)
-
Volume 21 (2016)
-
Volume 20 (2015)
-
Volume 19 (2014)
-
Volume 18 (2013)
-
Volume 17 (2012)
-
Volume 16 (2011)
-
Volume 15 (2010)
-
Volume 14 (2009)
-
Volume 13 (2008)
-
Volume 12 (2007)
-
Volume 11 (2006)
-
Volume 10 (2005)
-
Volume 9 (2004)
-
Volume 8 (2003)
-
Volume 7 (2002)
-
Volume 6 (2001)
-
Volume 5 (2000)
-
Volume 4 (1999)
-
Volume 3 (1998)
-
Volume 2 (1997)
-
Volume 1 (1996)
Most Read This Month
-
-
Comparing Corpora
Author(s): Adam Kilgarriff
-
- More Less