- Home
- e-Journals
- International Journal of Corpus Linguistics
- Previous Issues
- Volume 15, Issue, 2010
International Journal of Corpus Linguistics - Volume 15, Issue 4, 2010
Volume 15, Issue 4, 2010
-
Choosing the best tools for comparative analyses of texts
Author(s): Eugène Mollet, Alison Wray, Tess Fitzpatrick, Naomi R. Wray and Margaret J. Wrightpp.: 429–473 (45)More LessWhat measurements should linguists use when comparing texts written by different writers? We report aspects of a systematic evaluation of 381 different language measures derived from 200 analytic tools, carried out during the pilot for a study exploring genetic contributions to language variation. The measures covered lexis, structure, meaning, and discourse features, and were evaluated with a focus on capturing numerically the qualitative features that linguists consider central to differentiating one text from another. We review principles for selecting analytic tools, and the choices faced by the researcher in processing and analysing data. We then identify and demonstrate five of the measures, which between them provide a useful profile of different linguistic features, and note correlations with psychometric measures taken for each writer. We conclude with some caveats regarding general issues of validity and some indications about potential links between our work and research into authorship attribution for forensic purposes
-
Automatic analysis of syntactic complexity in second language writing
Author(s): Xiaofei Lupp.: 474–496 (23)More LessWe describe a computational system for automatic analysis of syntactic complexity in second language writing using fourteen different measures that have been explored or proposed in studies of second language development. The system takes a written language sample as input and produces fourteen indices of syntactic complexity of the sample based on these measures. The system is designed with advanced second language proficiency research in mind, and is therefore developed and evaluated using college-level second language writing data from the Written English Corpus of Chinese Learners (Wen et al. 2005). Experimental results show that the system achieves very high reliability on unseen test data from the corpus. We illustrate how the system is used in an example application to investigate whether and to what extent each of these measures significantly differentiate between different proficiency levels
-
A corpus driven study of the potential for vocabulary learning through watching movies
Author(s): Stuart Webbpp.: 497–519 (23)More LessIn this corpus driven study, the scripts of 143 movies consisting of 1,267,236 running words were analyzed using the RANGE program (Heatley et al. 2002) to determine the number of encounters with low frequency words. Low frequency words were operationalized as items from Nation’s (2004) 4th to 14th 1,000-word BNC lists. The results showed that in a single movie, few words were encountered 10 or more times indicating that only a small number of words may be learned through watching one movie. However, as the number of movies analyzed increased, the number of words encountered 10 or more times increased. Twenty-three percent of the word families from Nation’s (2004) 4th 1,000-word list were encountered 10 or more times in a set of 70 movies. This indicates that if learners watch movies regularly over a long period of time, there is the potential for significant incidental learning to occur
-
Lexical gravity across varieties of English: An ICE-based study of n-grams in Asian Englishes
Author(s): Stefan Th. Gries and Joybrato Mukherjeepp.: 520–548 (29)More LessIn our earlier work on three Asian Englishes and British English, we showed how lexico-syntactic co-occurrence preferences for three argument structure constructions revealed differences between varieties that correlated well with Schneider’s (2003, 2007) model of evolutionary stages. Here, we turn to lexical co-occurrence preferences and investigate if and to what degree n-grams distinguish between different modes and varieties in the same components of the International Corpus of English. Our approach to n-grams differs from previous work in that we neither use raw frequencies nor (problematic) MI-values but the newly proposed measure of lexical gravity (cf. Daudaravičius & Marcinkevičienė 2004), which takes type frequencies into consideration. We show how lexical gravity can be extended to handle n-grams with n ≥ 3 and apply this method to our n-gram data; in addition, we suggest a new concept for describing the tendency of a word to occur in significant n-grams: lexical stickiness.
Volumes & issues
-
Volume 29 (2024)
-
Volume 28 (2023)
-
Volume 27 (2022)
-
Volume 26 (2021)
-
Volume 25 (2020)
-
Volume 24 (2019)
-
Volume 23 (2018)
-
Volume 22 (2017)
-
Volume 21 (2016)
-
Volume 20 (2015)
-
Volume 19 (2014)
-
Volume 18 (2013)
-
Volume 17 (2012)
-
Volume 16 (2011)
-
Volume 15 (2010)
-
Volume 14 (2009)
-
Volume 13 (2008)
-
Volume 12 (2007)
-
Volume 11 (2006)
-
Volume 10 (2005)
-
Volume 9 (2004)
-
Volume 8 (2003)
-
Volume 7 (2002)
-
Volume 6 (2001)
-
Volume 5 (2000)
-
Volume 4 (1999)
-
Volume 3 (1998)
-
Volume 2 (1997)
-
Volume 1 (1996)
Most Read This Month
Article
content/journals/15699811
Journal
10
5
false
-
-
Comparing Corpora
Author(s): Adam Kilgarriff
-
- More Less