- Home
- e-Journals
- International Journal of Corpus Linguistics
- Fast Track Listing
International Journal of Corpus Linguistics - Online First
Online First articles are the published Version of Record, made available as soon as they are finalized and formatted. They are in general accessible to current subscribers, until they have been included in an issue, which is accessible to subscribers to the relevant volume
-
-
Reproducibility and transparency in interpretive corpus pragmatics
Author(s): Martin Schweinberger and Michael HaughAvailable online: 12 June 2025More LessAbstractIn this paper we extend the discussion about reproducibility in corpus linguistics from quantitative to qualitative corpus-based approaches and argue that concerns about reproducibility can be addressed in interpretive research paradigms like corpus pragmatics. We first suggest that in interpretive research traditions, transparency is more important than reproducibility. We then argue that interpretive research can be made more transparent and accessible by using notebooks to share analytical procedures. We support these claims through a case study in which we analyse responses to information-seeking utterance-final or questions in spoken Australian English data. We use a qualitative, discourse analytic approach to systematically examine examples of these utterances from selected corpora. We show how corpus linguistic research can draw on existing infrastructures and tools for ensuring transparency, reproducibility, and replicability of interpretive analyses of the pragmatic functions of linguistic tokens in situated contexts.
-
-
-
Reuse of social media data in corpus linguistics
Author(s): Mikko Laitinen and Paula RautionahoAvailable online: 12 June 2025More LessAbstractThe use of very large social media datasets in corpus linguistics has obvious benefits. Such data represent a novel source of evidence when compared with structured digital text corpora. However, there is a clear need to assess critically how the effective reuse of data can be handled, how findings can be reproduced, and how results can be generalized. A relevant question concerns the presentation of data to ensure reproducibility and replicability. This article surveys the state-of-the-art of descriptions of data collection and methodological transparency in 30 studies that used Twitter/X as their data. The empirical section investigates how easy it would be to reproduce a study based on these descriptions. While we concentrate on evidence from one social media application, the discussion continues to a presentation of concrete steps that might be used to improve data management related to the reuse, discovery, and evaluation of social media data in general.
-
-
-
Reproducibility, replicability, and robustness in corpus linguistics : An introduction
Author(s): Martin Schweinberger and Michael HaughAvailable online: 12 June 2025More LessAbstractThis introduction to the special issue Reproducibility, Replicability, and Robustness in Corpus Linguistics calls for more transparent and robust research practices in the field. It situates the discussion within the broader replication crisis in the life and social sciences and explores its relevance for corpus linguistics. The article identifies key areas for improvement — data management, workflows, and reporting — and showcases tools and principles such as FAIR/CARE, version control, reproducible notebooks, and open repositories. It highlights how corpus linguistics can build on open science infrastructures to enhance methodological rigor. Practical challenges, including data sensitivity and skill gaps, are addressed with actionable strategies. The issue brings together contributions that clarify core terminology, test the robustness of established methods, and suggest concrete ways forward. Together, these articles offer conceptual and practical guidance for making corpus linguistic research more open, verifiable, and aligned with broader scientific standards.
-
-
-
Review of Meyer (2023): English corpus linguistics: An introduction
Author(s): Ding HuangAvailable online: 10 June 2025More Less
-
-
-
-
-
Grammatical complexity in film dialogue : A corpus-based study from a register-functional perspective
Author(s): Maicol Formentelli, Liviana Galiano and Maria PavesiAvailable online: 23 May 2025More LessAbstractGrammatical complexity has traditionally been associated with the structural elaboration of texts, and, more recently, with the functionally-motivated use of syntactic patterns exhibiting internal variability along the written-to-spoken register continuum (Biber et al., 2022). Adopting a register-functional approach, the present corpus-based study investigates grammatical complexity in Anglophone film dialogue, focusing on the occurrence of finite and non-finite dependent clauses. Grammatical complexity in film language is assessed in relation to situational characteristics of onscreen dialogue and compared to previous findings on grammatical complexity in spontaneous conversation, with the overarching aim of contributing to corpus-based descriptions of language input relevant for second language acquisition. Results point to a functionally-driven distribution of clausal patterns, balancing narration, realism, emotionality, and economy of expression in the portrayed dialogue. They also show that while film language closely approximates the complexity of spontaneous spoken language, it exhibits distinctive features linked to register-specific communicative functions and medium-related constraints.
-
-
-
Achieving stability in corpus-based analysis of word types
Author(s): Jesse Egbert, Douglas Biber, Bethany Gray and Tove LarssonAvailable online: 20 May 2025More LessAbstractRank-ordered lists of word types are ubiquitous in corpus linguistics and applied linguistics. Word lists are commonly developed as aids for language teaching and learning, vocabulary testing, and language description. Yet, these lists are often produced and used without evaluation of their stability — or replicability — across corpus samples. Our primary objective in this paper is to describe the cumulative state of knowledge regarding the stability of corpus-based word type lists, focusing on three goals that motivate the creation and use of rank-ordered lists: identifying key lexical items for learning or teaching, assessing vocabulary size or knowledge, and identifying all items in a language domain. We show that word type lists are far less stable than researchers and practitioners often assume, although there is substantial variability in stability depending on the goals and methods behind list creation.
-
-
-
I’m so OCD lol : A corpus-based study of obsessive-compulsive disorder used as an adjective
Author(s): Jordan Batchelor and Heewon Lee-LaminackAvailable online: 15 April 2025More LessAbstractObsessive-compulsive disorder (OCD) is characterized by recurrent thoughts (obsessions) and repetitive behaviors (compulsions) that are thought to help mitigate obsessions (APA, 2013). One issue that has gained attention in popular discourse is the use of OCD as an adjective (e.g. I’m so OCD), which is said to trivialize the disorder (NAMI, 2015). We collected a corpus of social media comments including the phrase degree adverb + OCD. The corpus was tagged with a semantic tagger (Rayson et al., 2004) to investigate the domains around the phrase. About a quarter of the 1,575 comments used the phrase to critique the popular usage of OCD as an adjective, suggesting that it is frequently negatively evaluated. The remaining genuine uses support the idea that the phrase is often used in non-medical contexts, including to express individual preferences for organization and cleanliness. We argue that this usage is negatively evaluated because it demedicalizes OCD and portrays it with a light-hearted tone.
-
-
-
Reproducibility, replicability, robustness, and generalizability in corpus linguistics
Author(s): Joseph FlanaganAvailable online: 14 February 2025More LessAbstractEstablishing the credibility of scientific research involves several related but significantly different concerns. One potential problem in surveying different approaches to these concerns is that of terminology, as some of the basic terms used in the discussion — reproducibility, replicability, robustness, and generalizability — are often used in inconsistent or contradictory ways. This paper proposes to resolve such confusion by providing a terminological framework for discussing what kind of confirmation is necessary for a scientific study to be deemed credible. A study is said to be ‘reproducible’ if we can obtain identical results by performing an identical analysis on identical data, ‘replicable’ if we can obtain consistent results using the same analysis on different data, ‘robust’ if we can obtain consistent results from identical data using a different analysis, and ‘generalizable’ if we can obtain consistent results from different data using a different analysis.
-
-
-
Syntactic position of contrast markers in different registers of French
Author(s): Jorina Brysbaert and Karen LahousseAvailable online: 16 January 2025More LessAbstractThis paper presents a quantitative corpus analysis of three syntactically mobile contrast markers in different registers of French: contrastive adverbs, emphatic pronouns, and emphatic pronouns introduced by quant à “as for”. We show that the preferred syntactic position of the three markers is influenced by their form and discourse function, but that the degree of this influence varies across registers. In informal written and spoken French, form and discourse function have a greater impact on syntactic position than in formal written French, where the standard word order subject + verb + other clause elements is favored, and non-neutral (inter)subjective peripheries are avoided. Hence, our analysis provides evidence for the idea that informal written and spoken French are (becoming) more discourse-configurational.
-
Most Read This Month Most Read RSS feed
-
-
The Spoken BNC2014
Author(s): Robbie Love, Claire Dembry, Andrew Hardie, Vaclav Brezina and Tony McEnery
-
- More Less