- Home
- e-Journals
- International Journal of Corpus Linguistics
- Previous Issues
- Volume 29, Issue 4, 2024
International Journal of Corpus Linguistics - Volume 29, Issue 4, 2024
Volume 29, Issue 4, 2024
-
“People should get their booster”
Author(s): Hang (Joanna) Zou and Ken Hylandpp.: 447–471 (25)More LessAbstractDebates around the efficacy and dangers of vaccination have taken on critical importance with the Covid pandemic and WHO naming vaccine hesitancy as a major global health threat. We explore how writers use two types of blog, academic and journalistic, to promote key public health messages around the effectiveness and necessity of Covid-19 vaccinations to a broad, heterogeneous audience. Examining 120 Covid-19 vaccination themed posts from reputable news and academic blog sites, we compare the different ways writers present a stance and take a position towards vaccines and vaccinations in these different interactional contexts. Findings show that both types of bloggers are clearly aware of the need to convey a stance towards their topic and audiences feel entitled to position themselves in relation to vaccination issues, but with different emphases. The study has important implications for how healthcare information is disseminated and persuasion accomplished in these public arenas of discourse.
-
Case and agreement variation in contact
pp.: 472–506 (35)More LessAbstractThis study investigates the influence of language contact on morphosyntactic variation in World Englishes, specifically focusing on the joint variation of case and agreement in it-clefts with pronominal clefted constituents. Employing a multifactorial approach within the framework of probabilistic grammar, we examine the distribution of the four relevant it-cleft variants in the GloWbE corpus. We find that language contact, as a language-external factor, impacts the strengths and rankings of language-internal factors but not their directions. Additionally, we observe an intricate interplay between language contact and language-internal factors in shaping morphosyntactic patterns: low-contact varieties tend to display feature-based case and agreement with a high degree of variability, while high-contact varieties tend to exhibit position-based case and agreement with a low degree of variability. These findings shed light on the mechanisms underlying the development of language diversity and structural simplification in World Englishes.
-
Down-sampling from hierarchically structured corpus data
Author(s): Lukas Sönningpp.: 507–533 (27)More LessAbstractResource constraints often force researchers to downsize the list of tokens returned by a corpus query. This paper sketches a methodology for down-sampling and offers a survey of current practices. We build on earlier work and extend the evaluation of down-sampling designs to settings where tokens are clustered by text file and lexeme. Our case study deals with third-person present-tense verb inflection in Early Modern English and focuses on five predictors: year, gender, genre, frequency, and phonological context. We evaluate two strategies for selecting 2,000 (out of 11,645) tokens: simple down-sampling, where each hit has the same selection probability; and structured down-sampling, where this probability is inversely proportional to the author- and verb-specific token count. We form 500 subsamples using each scheme and compare regression results to a reference model fit to the full set of cases. We observe that structured down-sampling shows better performance on several evaluation criteria.
-
Assessing the potential of LLM-assisted annotation for corpus-based pragmatics and discourse analysis
Author(s): Danni Yu, Luyang Li, Hang Su and Matteo Fuolipp.: 534–561 (28)More LessAbstractCertain forms of linguistic annotation, like part of speech and semantic tagging, can be automated with high accuracy. However, manual annotation is still necessary for complex pragmatic and discursive features that lack a direct mapping to lexical forms. This manual process is time-consuming and error-prone, limiting the scalability of function-to-form approaches in corpus linguistics. To address this, our study explores the possibility of using large language models (LLMs) to automate pragma-discursive corpus annotation. We compare GPT-3.5 (the model behind the free-to-use version of ChatGPT), GPT-4 (the model underpinning the precise mode of Bing chatbot), and a human coder in annotating apology components in English based on the local grammar framework. We find that GPT-4 outperformed GPT-3.5, with accuracy approaching that of a human coder. These results suggest that LLMs can be successfully deployed to aid pragma-discursive corpus annotation, making the process more efficient, scalable, and accessible.
-
A corpus-based analysis of ‘vernacular synonyms’
Author(s): Caterina Guardamagnapp.: 562–594 (33)More LessAbstractAgainst the backdrop of the significant social changes taking place during the Renaissance, this paper interrogates the lexical domain of citizenship, focusing on three words deemed near-synonymous in the historical literature: citizens, burgesses, and freemen. The study takes a quantitative corpus-linguistic approach to the data in the Early English Books Online corpus (1550–1699) and consults lexicographical sources (the Oxford English Dictionary, the Historical Thesaurus of the Oxford English Dictionary, and the Lexicons of Early Modern English 1550–1700) to offer an overview of the organisation of the conceptual domain occupied by citizenship terms referring to “dwellers”. The relationships between citizens, burgesses, and freemen over time are addressed through detailed quantitative collocation analysis, considering their overall profile, stability and innovation, and areas of functional overlap and distinctiveness. Overall, the results support historians’ intuitions that citizens, burgesses, and freemen are “vernacular synonyms”.
-
A user-friendly corpus tool for disciplinary data-driven learning
Author(s): Peter Crosthwaite and Vít Baisapp.: 595–610 (16)More LessAbstractMost corpus tools commonly used for corpus-based data-driven learning (DDL) are designed for research rather than teaching purposes, with much DDL research suggesting learners and their teachers often stop DDL after initial training due to tool-related issues like complex user interfaces and system settings. Based on feedback from secondary-age language learners and their teachers in the Australian context, we present CorpusMate (https://corpusmate.com), a new, user-friendly corpus tool that incorporates several publicly available written and spoken corpora across 20 disciplinary subjects. It offers a range of flexible concordancing, n-gram and data visualisation options to ensure a fast, smooth and simple DDL experience for end users.
-
Review of Di Cristofaro (2023): Corpus approaches to language in social media
Author(s): Aleksandra Sevastianovapp.: 611–616 (6)More LessThis article reviews Corpus approaches to language in social media
-
Review of Price (2022): The language of mental illness: Corpus linguistics and the construction of mental illness in the press
Author(s): Katherine Ann Irelandpp.: 617–621 (5)More LessThis article reviews The language of mental illness: Corpus linguistics and the construction of mental illness in the press
Volumes & issues
-
Volume 29 (2024)
-
Volume 28 (2023)
-
Volume 27 (2022)
-
Volume 26 (2021)
-
Volume 25 (2020)
-
Volume 24 (2019)
-
Volume 23 (2018)
-
Volume 22 (2017)
-
Volume 21 (2016)
-
Volume 20 (2015)
-
Volume 19 (2014)
-
Volume 18 (2013)
-
Volume 17 (2012)
-
Volume 16 (2011)
-
Volume 15 (2010)
-
Volume 14 (2009)
-
Volume 13 (2008)
-
Volume 12 (2007)
-
Volume 11 (2006)
-
Volume 10 (2005)
-
Volume 9 (2004)
-
Volume 8 (2003)
-
Volume 7 (2002)
-
Volume 6 (2001)
-
Volume 5 (2000)
-
Volume 4 (1999)
-
Volume 3 (1998)
-
Volume 2 (1997)
-
Volume 1 (1996)
Most Read This Month
-
-
The Spoken BNC2014
Author(s): Robbie Love, Claire Dembry, Andrew Hardie, Vaclav Brezina and Tony McEnery
-
- More Less