- Home
- e-Journals
- International Journal of Corpus Linguistics
- Previous Issues
- Volume 28, Issue 3, 2023
International Journal of Corpus Linguistics - Volume 28, Issue 3, 2023
Volume 28, Issue 3, 2023
-
Things we smell and things they smell like
Author(s): Thomas Poultonpp.: 291–317 (27)More LessAbstractThe sense of smell has been relatively neglected in the Western research. It is not regarded as particularly useful compared to the perceived importance of senses like sight, sound, and touch. Correspondingly, English speakers are ill-equipped to describe qualities of smells, instead invoking entities that share similar olfactory qualities, e.g. like roses. This raises the question: which odours do English speakers frequently refer to, and which terms describe them? This corpus-driven study looks at nouns in olfactory contexts, and the conceptual domains they fall into. Results show that speakers invoke different smells according to context: when talking about a smell they perceive, when describing a smell, or in a description of another smell, which demonstrates the differential communicative functions of smells. Further analysis shows that smells that are described are more variable than those used as descriptors, and smells being used to describe are more emotional using psychometric norming data.
-
Assessing word commonness
Author(s): Mikkel Ekeland Paulsenpp.: 318–343 (26)More LessAbstractThe article investigates the two main corpus indicators of word commonness, frequency and dispersion, through a cross-validation analysis of frequency and four dispersion measures (‘Range’, ‘Chi-squared’, ‘Deviation of Proportions’ and ‘Juilland’s D’). The approach provides an estimation of the capacity of the named measures to predict the distribution of corpus items in an extracted language sample. Based on a dataset of 273 Norwegian compounds, the results show that especially Deviation of Proportions is a robust measure of dispersion that can be used in conjunction with frequency to substantiate assertions of word commonness based on corpus data. In addition, dispersion measures do not only reflect what sort of distribution the frequency statistic is generated from, but also how reliable the frequency estimation in the corpus sample is in terms of giving an accurate representation of frequency in the language variety that the corpus is sampled from.
-
Research trends in corpus linguistics
Author(s): Peter Crosthwaite, Sulistya Ningrum and Martin Schweinbergerpp.: 344–377 (34)More LessAbstractThis paper uses a bibliometric analysis to map the field of Corpus Linguistics (CL) research in arts and humanities over the last 20 years, tracking changes in popular CL research topics, outlets, highly cited authors, and geographical origins based on the metadata of 5,829 CL-related articles from 429 Scopus-indexed journals. Results reveal an increase in corpus-assisted discourse studies, lexical bundles and academic writing, alongside newer topics including multilingualism and social media. CL studies span 193 languages/dialects with a significant rise in Chinese, Russian, Spanish, and Italian CL research over the past decade. Clusters of highly cited CL researchers are identified spanning (inter)disciplinary research areas. An increase of CL researchers in China, Poland, South Korea, Japan, and more is evidence of the now global reach of CL research. These findings mirror diachronic socio-cultural developments in applied linguistics and society more generally and provide insights into what CL research might come next.
-
Differences in syntactic annotation affect retrieval
Author(s): Eva Zehentner, Marianne Hundt, Gerold Schneider and Melanie Röthlisbergerpp.: 378–406 (29)More LessAbstractPrepositional phrases (PPs) play an important part in English argument structure constructions, but pose considerable challenges for linguistic investigations of any kind. In addition to the fact that PP-attachment is generally notoriously difficult to model computationally, a particularly striking methodological challenge in investigating verb-dependent PPs across (synchronic and/or diachronic) corpora is that such cross-corpus studies may have to rely on material annotated with different tools. This study evaluates the impact that such differences in corpus annotation may have on retrieval of verb-attached PPs by means of data from Early and Late Modern English corpora. Our intrinsic (recall/precision) and extrinsic parser evaluation shows that annotation does play a role, but that the noise introduced is negligible as far as frequency developments are concerned.
-
A year to remember?
Author(s): Paul Bakerpp.: 407–429 (23)More LessAbstractThis paper describes the collection and analysis of the most recent edition of the Brown family, the BE21 corpus, consisting of 1 million words of written British English texts, published in 2021. Using the Coefficient of Variance, the frequencies of part of speech tags in BE21 are compared against the other four British members of the Brown family (from 1931, 1961, 1991 and 2006). Part of speech tags that are steadily increasing or decreasing in all five or the latest three corpora are examined via concordance lines and their distributions in order to identify long-standing and emerging trends in British English. The analysis points to the continuation of some trends (such as declines in modal verbs and titles of address), along with newer trends like the rise of first person pronouns. The analysis indicates that more general trends of densification, democratisation and colloquialisation are continuing in British English.
-
Annotation uncertainty in the context of grammatical change
Author(s): Marie-Luis Merten, Marcel Wever, Michaela Geierhos, Doris Tophinke and Eyke Hüllermeierpp.: 430–459 (30)More LessAbstractThis paper elaborates on the notion of uncertainty in the context of annotation in large text corpora, specifically focusing on (but not limited to) historical languages. Such uncertainty might be due to inherent properties of the language, for example, linguistic ambiguity and overlapping categories of linguistic description, but could also be caused by a lack of annotation expertise. By examining annotation uncertainty in more detail, we identify the sources, deepen our understanding of the nature and different types of uncertainty encountered in daily annotation practice, and discuss practical implications of our theoretical findings. This paper can be seen as an attempt to reconcile the perspectives of the main scientific disciplines involved in corpus projects, linguistics and computer science, to develop a unified view and to highlight the potential synergies between these disciplines.
Volumes & issues
-
Volume 29 (2024)
-
Volume 28 (2023)
-
Volume 27 (2022)
-
Volume 26 (2021)
-
Volume 25 (2020)
-
Volume 24 (2019)
-
Volume 23 (2018)
-
Volume 22 (2017)
-
Volume 21 (2016)
-
Volume 20 (2015)
-
Volume 19 (2014)
-
Volume 18 (2013)
-
Volume 17 (2012)
-
Volume 16 (2011)
-
Volume 15 (2010)
-
Volume 14 (2009)
-
Volume 13 (2008)
-
Volume 12 (2007)
-
Volume 11 (2006)
-
Volume 10 (2005)
-
Volume 9 (2004)
-
Volume 8 (2003)
-
Volume 7 (2002)
-
Volume 6 (2001)
-
Volume 5 (2000)
-
Volume 4 (1999)
-
Volume 3 (1998)
-
Volume 2 (1997)
-
Volume 1 (1996)
Most Read This Month

-
-
The Spoken BNC2014
Author(s): Robbie Love, Claire Dembry, Andrew Hardie, Vaclav Brezina and Tony McEnery
-
- More Less