Volume 25, Issue 4
  • ISSN 1384-6655
  • E-ISSN: 1569-9811
Buy:$35.00 + Taxes



This paper investigates the contribution of author/idiolect vs. register/type-of-text – as the most salient factors influencing the final shape of a text – towards explaining the variation observed in Czech texts. Since it is almost impossible to explore the effect of these factors on authentic data, we used elicited letters collected in a fully crossed experimental design (representative sample of 200 authors × four elicitation scenarios serving as a proxy to register variation). The variation encompassed by the elicited texts is analyzed through the lens of a general-purpose multi-dimensional model of Czech. Using triangulation via three established statistical methods and one devised for the purpose of this study, we find that register matters a great deal, explaining 1.5 times as much variation overall as idiolect. This should be taken into account when designing research in sociolinguistics or variation studies in general.


Article metrics loading...

Loading full text...

Full text loading...


  1. Amoroso, L. W.
    (2018) Analyzing group differences. InA. Phakiti, P. D. Costa, L. Plonsky, & S. Starfield (Eds.), The Palgrave Handbook of Applied Linguistics Research Methodology (pp.501–521). Palgrave Macmillan. 10.1057/978‑1‑137‑59900‑1_22
    https://doi.org/10.1057/978-1-137-59900-1_22 [Google Scholar]
  2. Baayen, H., van Halteren, H., & Tweedie, F.
    (1996) Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing, 11(3), 121–132. 10.1093/llc/11.3.121
    https://doi.org/10.1093/llc/11.3.121 [Google Scholar]
  3. Bakeman, R.
    (2005) Recommended effect size statistics for repeated measures designs. Behavior Research Methods, 37(3), 379–384. 10.3758/BF03192707
    https://doi.org/10.3758/BF03192707 [Google Scholar]
  4. Baker, P.
    (2010) Sociolinguistics and Corpus Linguistics. Edinburgh University Press.
    [Google Scholar]
  5. Baker, P., & Egbert, J.
    (2016) Triangulating Methodological Approaches in Corpus Linguistic Research. Routledge. 10.4324/9781315724812
    https://doi.org/10.4324/9781315724812 [Google Scholar]
  6. Bayley, R., Cameron, R., & Lucas, C.
    (Eds.) (2013) The Oxford Handbook of Sociolinguistics. Oxford University Press. 10.1093/oxfordhb/9780199744084.001.0001
    https://doi.org/10.1093/oxfordhb/9780199744084.001.0001 [Google Scholar]
  7. Biber, D.
    (1988) Variation Across Speech and Writing. Cambridge University Press. 10.1017/CBO9780511621024
    https://doi.org/10.1017/CBO9780511621024 [Google Scholar]
  8. (1995) Dimensions of Register Variation: A Cross-Linguistic Comparison. Cambridge University Press. 10.1017/CBO9780511519871
    https://doi.org/10.1017/CBO9780511519871 [Google Scholar]
  9. (2012) Register as a predictor of linguistic variation. Corpus Linguistics and Linguistic Theory, 8(1), 9–37. 10.1515/cllt‑2012‑0002
    https://doi.org/10.1515/cllt-2012-0002 [Google Scholar]
  10. Biber, D., & Conrad, S.
    (2009) Register, Genre, and Style. Cambridge University Press. 10.1017/CBO9780511814358
    https://doi.org/10.1017/CBO9780511814358 [Google Scholar]
  11. Biber, D., & Finegan, E.
    (Eds.) (1994) Sociolinguistic Perspectives on Register. Oxford University Press.
    [Google Scholar]
  12. Čermák, F.
    (Ed.) (2007) Slovník Karla Čapka [Karel Čapek՚s Dictionary]. Nakladatelství Lidové noviny.
    [Google Scholar]
  13. Český statistický úřad [Czech Statistical Office] (2015) Věk a vzdělání populace [Age and education of the population]. https://www.czso.cz
    [Google Scholar]
  14. Conrad, S.
    (2015) Register variation. InD. Biber, & R. Reppen (Eds.), The Cambridge Handbook of English Corpus Linguistics (pp.309–329). Cambridge University Press. 10.1017/CBO9781139764377.018
    https://doi.org/10.1017/CBO9781139764377.018 [Google Scholar]
  15. Cvrček, V., Komrsková, Z., Lukeš, D., Poukarová, P., Řehořková, A., & Zasina, A. J.
    (in preparation). Register variability of elicited texts.
    [Google Scholar]
  16. (2018a) From extra- to intratextual characteristics: Charting the space of variation in Czech through MDA. Corpus Linguistics and Linguistic Theory. Advance online publication. doi:  10.1515/cllt‑2018‑0020
    https://doi.org/10.1515/cllt-2018-0020 [Google Scholar]
  17. (2018b) Variabilita češtiny: Multidimenzionální analýza [Variability of Czech: A multi-dimensional analysis]. Slovo a slovesnost, 79(4), 293–321.
    [Google Scholar]
  18. Cvrček, V., Komrsková, Z., Lukeš, D., Poukarová, P., Řehořková, A., Zasina, A. J., & Benko, V.
    (2020) Comparing web-crawled and traditional corpora. Language Resources and Evaluation, 54, 713–745. 10.1007/s10579‑020‑09487‑4
    https://doi.org/10.1007/s10579-020-09487-4 [Google Scholar]
  19. Eckert, E.
    (Ed.) (1993) Varieties of Czech: Studies in Czech Sociolinguistics. Rodopi.
    [Google Scholar]
  20. Egbert, J., & Baker, P.
    (2019) Using Corpus Methods to Triangulate Linguistic Analysis. Taylor & Francis. 10.4324/9781315112466
    https://doi.org/10.4324/9781315112466 [Google Scholar]
  21. Fairclough, N.
    (2003) Analysing Discourse: Textual Analysis for Social Research. Routledge. 10.4324/9780203697078
    https://doi.org/10.4324/9780203697078 [Google Scholar]
  22. Finegan, E., & Rickford, J. R.
    (Eds.) (2004) Language in the USA: Themes for the 21st Century. Cambridge University Press. 10.1017/CBO9780511809880
    https://doi.org/10.1017/CBO9780511809880 [Google Scholar]
  23. Grant, T.
    (2007) Quantifying evidence in forensic authorship analysis. International Journal of Speech, Language and the Law, 14(1), 1–25. 10.1558/ijsll.v14i1.1
    https://doi.org/10.1558/ijsll.v14i1.1 [Google Scholar]
  24. Grice, J. W.
    (2001) Computing and evaluating factor scores. Psychological Methods, 6(4), 430–450. 10.1037/1082‑989X.6.4.430
    https://doi.org/10.1037/1082-989X.6.4.430 [Google Scholar]
  25. Hinrichs, L., & Szmrecsanyi, B.
    (2007) Recent changes in the function and frequency of Standard English genitive constructions: A multivariate analysis of tagged corpora. English Language & Linguistics, 11(3), 437–474. 10.1017/S1360674307002341
    https://doi.org/10.1017/S1360674307002341 [Google Scholar]
  26. Hnátková, M.
    (2002) Značkování frazémů a idiomů v Českém národním korpusu s pomocí Slovníku české frazeologie a idiomatiky [Tagging phraseological units and idioms in the Czech National Corpus with the aid of the Dictionary of Czech phraseology and idiomatics]. Slovo a slovesnost, 63(2), 117–126.
    [Google Scholar]
  27. Iwasaki, S., & Horie, P. I.
    (2000) Creating speech register in Thai conversation. Language in Society, 29(4), 519–554. 10.1017/S0047404500004024
    https://doi.org/10.1017/S0047404500004024 [Google Scholar]
  28. Jelínek, T.
    (2008) Nové značkování v Českém národním korpusu [New tagging in the Czech National Corpus]. Naše řeč, 91(1), 13–20.
    [Google Scholar]
  29. King, B. M., Rosopa, P. J., & Minium, E. W.
    (2010) Some (almost) assumption-free tests. InStatistical Reasoning in the Behavioral Sciences (6th ed., pp.381–401). Wiley.
    [Google Scholar]
  30. Krejci, B., & Hilton, K.
    (2017) There’s three variants: Agreement variation in existential there constructions. Language Variation and Change, 29(2), 187–204. 10.1017/S0954394517000096
    https://doi.org/10.1017/S0954394517000096 [Google Scholar]
  31. Kučera, D.
    (2017) Computational psycholinguistic analysis of Czech text and the CPACT research. InISC SGEM4th International Multidisciplinary Scientific Conference on Social Sciences and Arts SGEM 2017: Science & Society Conference Proceedings, (pp.77–84). ISC SGEM. doi:  10.5593/sgemsocial2017/32/S11.010
    https://doi.org/10.5593/sgemsocial2017/32/S11.010 [Google Scholar]
  32. Kučera, D., & Havigerová, J. M.
    (2015) Computational psycholinguistic analysis and its application in psychological assessment of college students. Journal of Pedagogy, 6(1), 61–72. 10.1515/jped‑2015‑0004
    https://doi.org/10.1515/jped-2015-0004 [Google Scholar]
  33. Labov, W.
    (1966) The Social Stratification of English in New York City. Center for Applied Linguistics.
    [Google Scholar]
  34. Louwerse, M. M.
    (2004) Semantic variation in idiolect and sociolect: Corpus linguistic evidence from literary texts. Computers and the Humanities, 38(2), 207–221. 10.1023/B:CHUM.0000031185.88395.b1
    https://doi.org/10.1023/B:CHUM.0000031185.88395.b1 [Google Scholar]
  35. McMenamin, G. R.
    (2002) Forensic Linguistics: Advances in Forensic Stylistics. CRC Press. 10.1201/9781420041170
    https://doi.org/10.1201/9781420041170 [Google Scholar]
  36. Milroy, L., & Gordon, M.
    (2003) Sociolinguistics: Models and Methods. Blackwell. 10.1002/9780470758359
    https://doi.org/10.1002/9780470758359 [Google Scholar]
  37. Nakagawa, S., Johnson, P. C. D., & Schielzeth, H.
    (2017) The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. Journal of the Royal Society, Interface, 14(134). doi:  10.1098/rsif.2017.0213
    https://doi.org/10.1098/rsif.2017.0213 [Google Scholar]
  38. Olsson, J.
    (2008) Forensic Linguistics (2nd ed.). Continuum.
    [Google Scholar]
  39. Page, N.
    (2011) The Language of Jane Austen. Routledge.
    [Google Scholar]
  40. Petkevič, V.
    (2014) Problémy automatické morfologické disambiguace češtiny [Problems of automatic morphological disambiguation of Czech]. Naše řeč, 97(4–5), 194–207.
    [Google Scholar]
  41. Rickford, J. R., & McNair-Knox, F.
    (1994) Addressee- and topic-influenced style shift: A quantitative sociolinguistic study. InD. Biber & E. Finegan (Eds.), Sociolinguistic Perspectives on Register (pp.235–276). Oxford University Press.
    [Google Scholar]
  42. Riordan, B.
    (2007) There’s two ways to say it: Modeling nonprestige there’s. Corpus Linguistics and Linguistic Theory, 3(2), 233–279. 10.1515/CLLT.2007.013
    https://doi.org/10.1515/CLLT.2007.013 [Google Scholar]
  43. Spoustová, D., Hajič, J., Votrubec, J., Krbec, P., & Květoň, P.
    (2007) The best of two worlds: Cooperation of statistical and rule-based taggers for Czech. InJ. Piskorski & T. Hristo (Eds.), Proceedings of the Workshop on Balto-Slavonic Natural Language Processing (pp.67–74). Association for Computational Linguistics. https://www.aclweb.org/anthology/W07-1709
    [Google Scholar]
  44. Staples, S., Biber, D., & Reppen, R.
    (2018) Using corpus-based register analysis to explore the authenticity of high-stakes language exams: A register comparison of TOEFL iBT and disciplinary writing tasks. The Modern Language Journal, 102(2), 310–332. 10.1111/modl.12465
    https://doi.org/10.1111/modl.12465 [Google Scholar]
  45. Straková, J., Straka, M., & Hajič, J.
    (2013) A new state-of-the-art Czech named entity recognizer. InI. Habernal, & V. Matoušek (Eds.), Text, Speech, and Dialogue (pp.68–75). Springer. 10.1007/978‑3‑642‑40585‑3_10
    https://doi.org/10.1007/978-3-642-40585-3_10 [Google Scholar]
  46. (2014) Open-source tools for morphology, lemmatization, POS tagging and named entity recognition. InK. Bontcheva & J. Zhu (Eds.), Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations (pp.13–18). Association for Computational Linguistics. doi:  10.3115/v1/P14‑5003
    https://doi.org/10.3115/v1/P14-5003 [Google Scholar]
  47. Szmrecsanyi, B.
    (2005) Language users as creatures of habit: A corpus-based analysis of persistence in spoken English. Corpus Linguistics and Linguistic Theory, 1(1), 113–150. 10.1515/cllt.2005.1.1.113
    https://doi.org/10.1515/cllt.2005.1.1.113 [Google Scholar]
  48. Szmrecsanyi, B., & Hinrichs, L.
    (2008) Probabilistic determinants of genitive variation in spoken and written English: A multivariate comparison across time, space, and genres. InT. Nevalainen, I. Taavitsainen, P. Pahta, & M. Korhonen (Eds.), The Dynamics of Linguistic Variation: Corpus Evidence on English Past and Present (pp.291–309). John Benjamins. 10.1075/silv.2.22szm
    https://doi.org/10.1075/silv.2.22szm [Google Scholar]
  49. Tagliamonte, S.
    (1998) Was/were variation across the generations: View from the city of York. Language Variation and Change, 10(2), 153–191. 10.1017/S0954394500001277
    https://doi.org/10.1017/S0954394500001277 [Google Scholar]
  50. Tambouratzis, G., Markantonatou, S., Hairetakis, N., Vassiliou, M., Tambouratzis, D., & Carayannis, G.
    (2000) Discriminating the registers and styles in the Modern Greek language. InA. Kilgarriff & T. Berber Sardinha (Eds.), Proceedings of the Workshop on Comparing Corpora – Volume 9 (pp.35–42). Association for Computational Linguistics. doi:  10.3115/1117729.1117735
    https://doi.org/10.3115/1117729.1117735 [Google Scholar]
  51. Trudgill, P.
    (2004) Dialects (2nd ed.). Routledge.
    [Google Scholar]
  52. Zasina, A. J., Lukeš, D., Komrsková, Z., Poukarová, P., & Řehořková, A.
    (2018) Koditex: Korpus diverzifikovaných textů [Koditex: Corpus of diversified texts] (version 1). Ústav Českého národního korpusu FF UK. https://www.korpus.cz
    [Google Scholar]

Data & Media loading...

  • Article Type: Research Article
Keyword(s): Czech; idiolect; multi-dimensional analysis; register; variation
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error