- Home
- e-Journals
- International Journal of Corpus Linguistics
- Previous Issues
- Volume 19, Issue, 2014
International Journal of Corpus Linguistics - Volume 19, Issue 3, 2014
Volume 19, Issue 3, 2014
-
Frequency effects and second language lexical acquisition: Word types, word tokens, and word production
Author(s): Scott Crossley, Tom Salsbury, Ashley Titak and Danielle McNamarapp.: 301–332 (32)More LessFrequency effects in an L1 and L2 longitudinal corpus were investigated using Zipfian distribution analyses and linear curve estimations. The results demonstrated that the NS lexical input exhibited Zipfian distributions, but that the L2 lexical output did not match the NS Zipfian patterns. Word frequency analyses indicated that NS interlocutors modify their lexicon such that frequency scores decrease as a function of time that L2 learners have studied English. In contrast, the word frequency scores for the L2 output increased as a function of time. Post-hoc analyses indicated that differences in frequency scores between NS input and L2 output were best explained by the repetition of infrequent words, but not frequent words by L2 learners in the early stages of language acquisition. The results question absolute frequency interpretations of lexical acquisition for L2 learners and provide evidence for usage-based approaches for language learning.
-
Parallel corpora make sense: Bypassing the knowledge acquisition bottleneck for Word Sense Disambiguation
Author(s): Els Lefever and Véronique Hostepp.: 333–367 (35)More LessWe present a multilingual approach to Word Sense Disambiguation (WSD), which automatically assigns the contextually appropriate sense to a given word. Instead of using a predefined monolingual sense-inventory, we use a language-independent framework by deriving the senses of a given word from word alignments on a multilingual parallel corpus, which we made available for corpus linguistics research. We built five WSD systems with English as the input language and translations in five supported languages (viz. French, Dutch, Italian, Spanish and German) as senses. The systems incorporate both binary translation features and local context features. The experimental results are very competitive, which confirms our initial hypothesis that each language contributes to the disambiguation of polysemous words. Because our system extracts all information from the parallel corpus, it offers a flexible language-independent approach, which implicitly deals with the sense distinctness issue and allows us to bypass the knowledge acquisition bottleneck for WSD.
-
Investigating the representation of migrants in the UK and Italian press: A cross-linguistic corpus-assisted discourse analysis
Author(s): Charlotte Taylorpp.: 368–400 (33)More LessThis paper is a cross-linguistic corpus-assisted discourse study of the representation of migrants in the Italian and UK press and it adopts a two-stage methodological approach. In the first phase, the number of references to nationalities which collocate with refugees, asylum seekers, immigrants, migrants (and Italian equivalents) are calculated and this information is subsequently used to identify any ‘mismatch’ between the amount of attention that migrants from a given country receive in the media and the official population estimates. In the second, and most extensive stage, the representations of the foregrounded nationalities are analysed through the moral panic framework. Results show an extensive negative representation of some groups, but there is no evidence of a fully iterated moral panic relating to any of the nationalities investigated.
-
Making Google Books n-grams useful for a wide range of research on language change
Author(s): Mark Daviespp.: 401–416 (16)More LessThe “standard” Google Books n-grams were released by Google in 2010, and they include more than 155 billion words of data for the American English data alone. Unfortunately, the standard interface is far too simplistic to allow many types of useful research on this massive dataset. In this paper, I discuss an alternative “advanced” architecture and interface for these datasets, which is freely available at googlebooks.byu.edu. This resource allows for a wide range of research on lexical, phraseological, syntactic, and semantic changes in English, in ways that would not be possible with the standard interface. With this new resource, researchers now have access to hundreds of billions of words of data, and can map out changes in English in ways that were not previously possible.
-
Text Variation Explorer: Towards interactive visualization tools for corpus linguistics
Author(s): Harri Siirtola, Tanja Säily, Terttu Nevalainen and Kari-Jouko Räihäpp.: 417–429 (13)More LessThis paper reviews the gap between current methods of text visualization and the needs of corpus-linguistic research, and introduces a tool that takes a step towards bridging that gap. Current text visualization methods tend to treat the problem as a data-encoding issue only, and do not strive for interactive, tightly coupled representations of text that would foster discovery. The paper argues that such visualizations should always be linked for effortless movement between the text and its visualization, and that the visualization controls should provide continuous and immediate feedback to facilitate exploration. We introduce a tool, Text Variation Explorer (TVE), to demonstrate the aforementioned requirements. TVE allows visual and interactive examining of the behaviour of linguistic parameters affected by text window size and overlap, and in addition, performs interactive principal component analysis based on a user-given set of words.
Volumes & issues
-
Volume 29 (2024)
-
Volume 28 (2023)
-
Volume 27 (2022)
-
Volume 26 (2021)
-
Volume 25 (2020)
-
Volume 24 (2019)
-
Volume 23 (2018)
-
Volume 22 (2017)
-
Volume 21 (2016)
-
Volume 20 (2015)
-
Volume 19 (2014)
-
Volume 18 (2013)
-
Volume 17 (2012)
-
Volume 16 (2011)
-
Volume 15 (2010)
-
Volume 14 (2009)
-
Volume 13 (2008)
-
Volume 12 (2007)
-
Volume 11 (2006)
-
Volume 10 (2005)
-
Volume 9 (2004)
-
Volume 8 (2003)
-
Volume 7 (2002)
-
Volume 6 (2001)
-
Volume 5 (2000)
-
Volume 4 (1999)
-
Volume 3 (1998)
-
Volume 2 (1997)
-
Volume 1 (1996)
Most Read This Month
Article
content/journals/15699811
Journal
10
5
false

-
-
The Spoken BNC2014
Author(s): Robbie Love, Claire Dembry, Andrew Hardie, Vaclav Brezina and Tony McEnery
-
- More Less