- Home
- e-Journals
- International Journal of Corpus Linguistics
- Previous Issues
- Volume 20, Issue, 2015
International Journal of Corpus Linguistics - Volume 20, Issue 1, 2015
Volume 20, Issue 1, 2015
-
Publication type and discipline variation in published academic writing: Investigating statistical interaction in corpus data
Author(s): Jesse Egbertpp.: 1–29 (29)More LessThis study uses Multi-Dimensional analysis to describe linguistic variation in a corpus of published academic writing across three publication types in two disciplines. The resulting five dimensions were labeled: “Affective synthesis versus specialized information density”, “Definition and evaluation of new concepts”, “Author-centered stance”, “Reader-friendly narrative”, and “Abstract observation and description”. Factorial ANOVAs were used to test for significant interactions between publication type and discipline on each of the five linguistic dimensions. Statistical interactions were discovered for four of the five dimensions. The appropriate tests for statistical differences, either for main effects or simple effects, were performed, and publication type and discipline patterns were interpreted for all five dimensions. This paper highlights the importance of accounting for all of the independent factors in a corpus, using factorial ANOVAs where appropriate, in order to appropriately analyze and interpret patterns of linguistic variability.
-
Evaluating reliability in quantitative vocabulary studies: The influence of corpus design and composition
Author(s): Don Miller and Douglas Biberpp.: 30–53 (24)More LessRecent methodological advances have been used to create word lists based on large corpora. The present paper explores whether these corpora — and the associated lists — are unequivocally more representative. Corpus design considerations have usually focused on issues of external representativeness (representing the target discourse domain), while disregarding issues of internal representativeness (whether the corpus permits reliable descriptions of linguistic variation). This disregard may be especially problematic for studies of lexical variation, where it is difficult to achieve stable, reliable results from corpus analysis. The present paper illustrates these challenges through experiments based on analysis of a corpus representing a highly restricted discourse domain: university-level introductory psychology textbooks. The results indicate that corpus design and composition has a much greater influence on lexical variation than previously recognized, highlighting the need to evaluate internal representativeness in quantitative corpus-based research.
-
The corpus-based identification of cross-lectal synonyms in pluricentric languages
Author(s): Yves Peirsman, Dirk Geeraerts and Dirk Speelmanpp.: 54–80 (27)More LessThis article discusses a corpus-based method for the automatic identification of synonyms across different varieties of the same language. This method, based on the paradigm of distributional semantics, quantifies semantic similarity on the basis of contextual similarity in two comparable corpora. In two case studies for Dutch and German, we show that it automatically identifies the correct synonym for 31% and 25% of the target words, respectively. A manual error analysis moreover indicates that many additional synonyms are very close in the distributional model, while most other distributional neighbours are semantically related to the target word along other dimensions than synonymy. On the basis of these results, we argue that distributional-semantic methods can play a crucial role in the further evolution of corpus-based lexical semantics to a more quantitative discipline.
-
Automatic analysis of thematic structure in written English
Author(s): Kwanghyun Park and Xiaofei Lupp.: 81–101 (21)More LessThis paper proposes and describes a computational system for the automatic analysis of thematic structure, as defined in Systemic Functional Linguistics, in written English. The system takes an English text as input and produces as output an analysis of the thematic structure of each sentence in the text. The system is evaluated using data from The Wall Street Journal section of the Penn Treebank (Marcus et al. 1993) and the British Academic Written English corpus (Gardner & Nesi 2013). An experiment using these data shows that the system achieves a high degree of reliability in regard to both identifying theme-rheme boundaries and determining several of the linguistic properties of the identified themes, including syntactic nodes, theme function, markedness, mood types, and theme roles. To illustrate how the system is used, we describe an example application designed to compare collections of novice and expert academic writing in terms of thematic structure.
-
Gloss annotations in the Swedish Sign Language Corpus
Author(s): Johanna Mesch and Lars Wallinpp.: 102–120 (19)More LessThe Swedish Sign Language Corpus (SSLC) was compiled during the years 2009–2011 and consists of video-recorded conversations with 42 informants between the ages of 20 and 82 from three separate regions in Sweden. The overall aim of the project was to create a corpus of Swedish Sign Language (SSL) that could provide a core data source for research on language structure and use, as well as for dictionary work. A portion of the corpus has been annotated with glosses for signs and Swedish translations, and annotation of the entire corpus is ongoing. In this paper, we outline our scheme for gloss annotation and discuss issues that are relevant in creating the annotation system, with unique glosses for lexical signs, fingerspelling and productive signs. The annotation guidelines discussed in this paper cover both one- and two-handed signs in SSL, based on 33,600 tokens collected for the SSLC.
Volumes & issues
-
Volume 28 (2023)
-
Volume 27 (2022)
-
Volume 26 (2021)
-
Volume 25 (2020)
-
Volume 24 (2019)
-
Volume 23 (2018)
-
Volume 22 (2017)
-
Volume 21 (2016)
-
Volume 20 (2015)
-
Volume 19 (2014)
-
Volume 18 (2013)
-
Volume 17 (2012)
-
Volume 16 (2011)
-
Volume 15 (2010)
-
Volume 14 (2009)
-
Volume 13 (2008)
-
Volume 12 (2007)
-
Volume 11 (2006)
-
Volume 10 (2005)
-
Volume 9 (2004)
-
Volume 8 (2003)
-
Volume 7 (2002)
-
Volume 6 (2001)
-
Volume 5 (2000)
-
Volume 4 (1999)
-
Volume 3 (1998)
-
Volume 2 (1997)
-
Volume 1 (1996)
Most Read This Month
Article
content/journals/15699811
Journal
10
5
false

-
-
Comparing Corpora
Author(s): Adam Kilgarriff
-
- More Less