- Home
- e-Journals
- International Journal of Learner Corpus Research
- Previous Issues
- Volume 7, Issue 1, 2021
International Journal of Learner Corpus Research - Volume 7, Issue 1, 2021
Volume 7, Issue 1, 2021
-
Automated annotation of learner English
Author(s): Adriana Picoral, Shelley Staples and Randi Reppenpp.: 17–52 (36)More LessAbstractThis paper explores the use of natural language processing (NLP) tools and their utility for learner language analyses through a comparison of automatic linguistic annotation against a gold standard produced by humans. While there are a number of automated annotation tools for English currently available, little research is available on the accuracy of these tools when annotating learner data. We compare the performance of three linguistic annotation tools (a tagger and two parsers) on academic writing in English produced by learners (both L1 and L2 English speakers). We focus on lexico-grammatical patterns, including both phrasal and clausal features, since these are frequently investigated in applied linguistics studies. Our results report both precision and recall of annotation output for argumentative texts in English across four L1s: Arabic, Chinese, English, and Korean. We close with a discussion of the benefits and drawbacks of using automatic tools to annotate learner language.
-
Automatic analysis of passive constructions in Korean
Author(s): Gyu-Ho Shin and Boo Kyung Jungpp.: 53–82 (30)More LessAbstractThe present study aims to explore the applicability of automatic analysis to L2-Korean learner corpora, with a special focus on learners’ use of a clause-level construction. For this purpose, we investigate L1-Mandarin L2-Korean learners’ written production of two passive construction types in Korean – suffixal and periphrastic – by devising a pattern-extraction process through NLP techniques. We focus on reporting how the passive constructions are identified and extracted from learner writing automatically, given language-specific features involving the passive. A total of 72 essays were analysed by adapting an existing pipeline (developed by Shin, forthcoming), with enhanced tokenisation and annotation through manual revision of the data. Results showed that our automatic pattern-finder identified more instances than manual extraction for the suffixal passive and yielded a perfect match with manual extraction for the periphrastic passive. Implications of the findings are discussed in regard to strengths and drawbacks of the automatic analysis of learner writing, with suggestions for improving currently available tools for learner corpus research in Korean.
-
Analyzing the linguistic complexity of German learner language in a reading comprehension task
Author(s): Zarah Weiss and Detmar Meurerspp.: 83–130 (48)More LessAbstractWhile traditionally linguistic complexity analysis of learner language is mostly based on essays, there is increasing interest in other task types. This is crucial for obtaining a broader empirical basis for characterizing language proficiency and highlights the need to advance our understanding of how task and learner properties interact in shaping the linguistic complexity of learner productions. It also makes it important to determine which complexity measures generalize well across which tasks.
In this paper, we investigate the linguistic complexity of answers to reading comprehension questions written by foreign language learners of German at the college level. Analyzing the corpus with computational linguistic methods identifying a wide range of complexity features, we explore which linguistic complexity analyses can successfully be performed for such short answers, how learner proficiency impacts the results, how generalizable they are across different contexts, and how the quality of the underlying analysis impacts the results.
-
Assessing the impact of automatic dependency annotation on the measurement of phraseological complexity in L2 Dutch
Author(s): Rachel Rubinpp.: 131–162 (32)More LessAbstractThe extraction of phraseological units operationalized in phraseological complexity measures (Paquot, 2019) relies on automatic dependency annotations, yet the suitability of annotation tools for learner language is often overlooked. In the present article, two Dutch dependency parsers, Alpino (van Noord, 2006) and Frog (van den Bosch et al., 2007), are evaluated for their performance in automatically annotating three types of dependency relations (verb + direct object, adjectival modifier, and adverbial modifier relations) across three proficiency levels of L2 Dutch. These observations then serve as the basis for an investigation into the impact of automatic dependency annotation on phraseological sophistication measures. Results indicate that both learner proficiency and the type of dependency relation function as moderating factors in parser performance. Phraseological complexity measures computed on the basis of both automatic and manual dependency annotations demonstrate moderate to high correlations, reflecting a moderate to low impact of automatic annotation on subsequent analyses.
-
How operationalizations of word types affect measures of lexical diversity
Author(s): Scott Jarvis and Brett James Hashimotopp.: 163–194 (32)More LessAbstractThis study tests three measures of lexical diversity (LD), each using five operationalizations of word types. The measures include MTLD (measure of textual lexical diversity), MTLD-W (moving average MTLD with wrap-around measurement), and MATTR (moving average type-token ratio). Each of these measures is tested with types operationalized as orthographic forms, lemmas using automated POS tags, lemmas using manually corrected POS tags, flemmas (list-based lemmas that do not distinguish between parts of speech), and word families. These measures are applied to 60 narrative texts written in English by adolescent native speakers of English (n = 13), Finnish (n = 31), and Swedish (n = 16). Each individual LD measure is evaluated in relation to how well it correlates with the mean LD ratings of 55 human raters whose inter-rater reliability was exceedingly high (Cronbach’s alpha = .980). The overall results show that the three measures are comparable but two of the operationalizations of types produce mixed results across measures.
Most Read This Month
-
-
The Trinity Lancaster Corpus
Author(s): Dana Gablasova, Vaclav Brezina and Tony McEnery
-
- More Less