Statistical tests for the analysis of learner corpus data
This paper is an overview of several basic statistical tools in corpus-based SLA research. I first discuss a few issues relevant to the analysis of learner corpus data. Then, I illustrate a few widespread quantitative techniques and statistical visualizations and exemplify them on the basis of corpus data on the genitive alternation – the <i>of</i>-genitive vs. the <i>s</i>-genitive from German learners and native speakers of English. The statistical methods discussed include a test for differences between frequencies (the chi-squared test), tests for differences between means/medians (the <i>U</i>-test), and a more advanced multifactorial extension, binary logistic regression.