Volume 2, Issue 2
  • ISSN 2542-9477
  • E-ISSN: 2542-9485
Buy:$35.00 + Taxes



In this article, we present the results of a corpus-based study where we explore whether it is possible to automatically single out different facets of text complexity in a general-purpose corpus. To this end, we use factor analysis as applied in Biber’s multi-dimensional analysis framework. We evaluate the results of the factor solution by correlating factor scores and readability scores to ascertain whether the selected factor solution matches the independent measurement of readability, which is a notion tightly linked to text complexity. The corpus used in the study is the Swedish national corpus, called or SUC. The SUC contains subject-based text varieties (e.g., hobby), press genres (e.g., editorials), and mixed categories (e.g., miscellaneous). We refer to them collectively as ‘registers’. Results show that it is indeed possible to elicit and interpret facets of text complexity using factor analysis despite some caveats. We propose a tentative text complexity profiling of the SUC registers.


Article metrics loading...

Loading full text...

Full text loading...


  1. Adesam, Y. , Bouma, G. and Johansson, R.
    (2018) The Koala part-of-speechand morphological tagset for Swedish. SLTC.
    [Google Scholar]
  2. Asención-Delaney, Y. , & Collentine, J.
    (2011) A multidimensional analysis of a written L2 Spanish corpus. Applied linguistics, 32(3), 299–322.
    [Google Scholar]
  3. Biber, D.
    (1988) Variation across speech and writing. Cambridge University Press. 10.1017/CBO9780511621024
    https://doi.org/10.1017/CBO9780511621024 [Google Scholar]
  4. (1989) A typology of English texts. Linguistics, 27(1), 3–44. 10.1515/ling.1989.27.1.3
    https://doi.org/10.1515/ling.1989.27.1.3 [Google Scholar]
  5. (1995) Dimensions of register variation: A cross-linguistic comparison. Cambridge University Press. 10.1017/CBO9780511519871
    https://doi.org/10.1017/CBO9780511519871 [Google Scholar]
  6. Biber, D. , Johansson, S. , Leech, G. , Conrad, S. , & Finegan, E.
    (1999) Longman grammar of spoken and written English. Longman.
    [Google Scholar]
  7. Biber, D. , & Kurjian, J.
    (2007) Towards a taxonomy of web registers and text types: A multi- dimensional analysis. In Corpus Linguistics and the Web (pp.109–131). 10.1163/9789401203791_008
    https://doi.org/10.1163/9789401203791_008 [Google Scholar]
  8. Biber, D. , & Conrad, S.
    (2009) Register, genre, and style. Cambridge University Press. 10.1017/CBO9780511814358
    https://doi.org/10.1017/CBO9780511814358 [Google Scholar]
  9. Biber, D. , & Egbert, J.
    (2016) Register variation on the searchable web: A multi-dimensional analysis. Journal of English Linguistics, 44(2), 95–137. 10.1177/0075424216628955
    https://doi.org/10.1177/0075424216628955 [Google Scholar]
  10. Björnsson, C. H.
    (1968) Läsbarhet. Liber.
  11. Cattell, R. B.
    (1966) The scree test for the number of factors. Multivariate behavioral research, 1(2), 245–276. 10.1207/s15327906mbr0102_10
    https://doi.org/10.1207/s15327906mbr0102_10 [Google Scholar]
  12. Collins-Thompson, K.
    (2014) Computational assessment of text readability: A survey of current and future research. ITL-International Journal of Applied Linguistics, 165(2), 97–135. 10.1075/itl.165.2.01col
    https://doi.org/10.1075/itl.165.2.01col [Google Scholar]
  13. Common Core State Standards Initiative
    Common Core State Standards Initiative (2010) Common Core State Standards for English Language Arts & Literacy InHistory/Social Studies, Science, and Technical Subjects. Appendix A: Research Supporting Key Elements of the Standards, Glossary of Key Terms.
    [Google Scholar]
  14. Cvrček, V. , Komrsková, Z. , Lukeš, D. , Poukarová, P. , Řehořková, A. , Zasina, A. J. , & Benko, V.
    (2020) Comparing web-crawled and traditional corpora. Language Resources and Evaluation, 1–33.
    [Google Scholar]
  15. Dahl, Ö.
    (2004) The growth and maintenance of linguistic complexity (Vol.71). John Benjamins Publishing. 10.1075/slcs.71
  16. Dale, E. , & Chall, J. S.
    (1949) The concept of readability. Elementary English, 26(1), 19–26.
    [Google Scholar]
  17. Dell’Orletta, F. , Montemagni, S. , & Venturi, G.
    (2013), September). Linguistic profiling of texts across textual genres and readability levels. An exploratory study on Italian fictional prose. InProceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013 (pp.189–197).
    [Google Scholar]
  18. (2014) Assessing document and sentence readability in less resourced languages and across textual genres. ITL-International Journal of Applied Linguistics, 165(2), 163–193. 10.1075/itl.165.2.03del
    https://doi.org/10.1075/itl.165.2.03del [Google Scholar]
  19. DiStefano, C. , Zhu, M. , & Mindrila, D.
    (2009) Understanding and using factor scores: Considerations for the applied researcher. Practical Assessment, Research & Evaluation, 14(20), 1–11.
    [Google Scholar]
  20. Fahlborg, D. , & Rennes, E.
    (2016) Introducing SAPIS–an API service for text analysis and simplification. Inthe second national Swe-Clarin workshop: Research collaborations for the digital age, Umeå, Sweden.
    [Google Scholar]
  21. Falkenjack, J.
    (2018) Towards a model of general text complexity for Swedish (Doctoral dissertation, Linköping University Electronic Press).
  22. Falkenjack, J. , Mühlenbock, K. H. , & Jönsson, A.
    (2013), May). Features indicating readability in Swedish text. InProceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013) (pp.27–40).
    [Google Scholar]
  23. Falkenjack, J. , Santini, M. , & Jönsson, A.
    (2016) An exploratory study on genre classification using readability features. InProceedings of the Sixth Swedish Language Technology Conference (SLTC 2016), Umeå, Sweden.
    [Google Scholar]
  24. Feng, L.
    (2010) Automatic readability assessment (Doctoral dissertation, CUNY Academic Works).
  25. Field, A.
    (2000) Discovering statistics using SPSS for Windows. Londra: Sage Publication.
    [Google Scholar]
  26. Flesch, R.
    (1948) A new readibility yardstick. Journal of Applied Psychology, 32(3):221–23. 10.1037/h0057532
    https://doi.org/10.1037/h0057532 [Google Scholar]
  27. Field, A. , Miles, J. , & Field, Z.
    (2012) Discovering statistics using R.Sage publications.
    [Google Scholar]
  28. Hayton, J. C. , Allen, D. G. , & Scarpello, V.
    (2004) Factor retention decisions in exploratory factor analysis: A tutorial on parallel analysis. Organizational research methods, 7(2), 191–205. 10.1177/1094428104263675
    https://doi.org/10.1177/1094428104263675 [Google Scholar]
  29. Hiebert, E. H.
    (2012) Readability and the common core’s staircase of text complexity. Text Matters, 1.
    [Google Scholar]
  30. Horn, J. L.
    (1965) A rationale and test for the number of factors in factor analysis. Psychometrika30, 179–185. 10.1007/BF02289447
    https://doi.org/10.1007/BF02289447 [Google Scholar]
  31. Housen, A. , De Clercq, B. , Kuiken, F. , & Vedder, I.
    (2019) Multiple approaches to complexity in second language research. Second Language Research, 35(1), 3–21. 10.1177/0267658318809765
    https://doi.org/10.1177/0267658318809765 [Google Scholar]
  32. Jelen, B.
    (2013) Excel 2013 charts and graphs. Que Publishing Company.
    [Google Scholar]
  33. Jönsson, S. , Rennes, E. , Falkenjack, J. , & Jönsson, A.
    (2018) A component based approach to measuring text complexity. InProceedings of The Seventh Swedish Language Technology Conference 2018 (SLTC-18).
    [Google Scholar]
  34. Kate, R. J. , Luo, X. , Patwardhan, S. , Franz, M. , Florian, R. , Mooney, R. J. , & Welty, C.
    (2010), August). Learning to predict readability using diverse linguistic features. InProceedings of the 23rd international conference on computational linguistics (pp.546–554). Association for Computational Linguistics.
    [Google Scholar]
  35. Källgren, G. , Gustafson-Capková, S. , & Hartmann, B.
    (2006) Manual of the Stockholm Umeå Corpus version 2.0. Department of Linguistics, Stockholm University, December. Sofia Gustafson-Capková and Britt Hartmann (eds.).
    [Google Scholar]
  36. Ledesma, R. D. , Valero-Mora, P. , & Macbeth, G.
    (2015) The scree test and the number of factors: a dynamic graphics approach. The Spanish journal of psychology, 18. 10.1017/sjp.2015.13
    https://doi.org/10.1017/sjp.2015.13 [Google Scholar]
  37. Lu, X.
    (2010) Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474–496. 10.1075/ijcl.15.4.02lu
    https://doi.org/10.1075/ijcl.15.4.02lu [Google Scholar]
  38. Mühlenbock, K. H.
    (2013) I see what you mean: Assessing readability for specific target groups. (Doctoral dissertation, University of Gothenburg, Gothenburg, Sweden).
  39. Napolitano, D. , Sheehan, K. M. , & Mundkowsky, R.
    (2015), June). Online readability and text complexity analysis with Text Evaluator. InProceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations (pp.96–100).
    [Google Scholar]
  40. Nenkova, A. , Chae, J. , Louis, A. , & Pitler, E.
    (2010) Structural features for predicting the linguistic quality of text. InEmpirical methods in natural language generation (pp.222–241). Springer, Berlin, Heidelberg. 10.1007/978‑3‑642‑15573‑4_12
    https://doi.org/10.1007/978-3-642-15573-4_12 [Google Scholar]
  41. Nivre, J.
    (2006) Inductive dependency parsing (pp.87–120). SpringerNetherlands. 10.1007/1‑4020‑4889‑0_4
    https://doi.org/10.1007/1-4020-4889-0_4 [Google Scholar]
  42. Pallotti, G.
    (2015) A simple view of linguistic complexity. Second Language Research, 31(1), 117–134. 10.1177/0267658314536435
    https://doi.org/10.1177/0267658314536435 [Google Scholar]
  43. Petersen, S.
    (2007) Natural language processing tools for reading level assessment and text simplification for bilingual education. (Doctoral dissertation, University of Washington, Seattle, WA, USA).
  44. Petersen, S. E. , & Ostendorf, M.
    (2009) A machine learning approach to reading level assessment. Computer Speech & Language, 23(1), 89–106. 10.1016/j.csl.2008.04.003
    https://doi.org/10.1016/j.csl.2008.04.003 [Google Scholar]
  45. Pilán, I. , Vajjala, S. , & Volodina, E.
    (2016) A readable read: Automatic assessment of language learning materials based on linguistic complexity. arXiv preprint arXiv:1603.08868.
    [Google Scholar]
  46. Pitler, E. , & Nenkova, A.
    (2008), October). Revisiting readability: A unified framework for predicting text quality. InProceedings of the 2008 conference on empirical methods in natural language processing (pp.186–195).
    [Google Scholar]
  47. Rello, L. , Baeza-Yates, R. , Bott, S. , & Saggion, H.
    (2013a) Simplify or help? Text simplification strategies for people with dyslexia. InProceedings of the 10th International Cross-Disciplinary Conference on Web Accessibility (pp.1–10).
    [Google Scholar]
  48. Rello, L. , Baeza-Yates, R. , Dempere-Marco, L. , and Saggion, H.
    (2013b) Frequent words improve readability and short words improve understandability for people with dyslexia. InIFIP Conference on Human-Computer Interaction (pp.203–219. Springer.
    [Google Scholar]
  49. Saggion, H.
    (2017) Automatic text simplification. Synthesis Lectures on Human Language Technologies, 10(1), 1–137. 10.2200/S00700ED1V01Y201602HLT032
    https://doi.org/10.2200/S00700ED1V01Y201602HLT032 [Google Scholar]
  50. Santini, M. , Danielsson, B. , & Jönsson, A.
    (2019), August). Introducing the Notion of ‘Contrast’Features for Language Technology. InInternational Conference on Database and Expert Systems Applications (pp.189–198). Springer, Cham. 10.1007/978‑3‑030‑27684‑3_24
    https://doi.org/10.1007/978-3-030-27684-3_24 [Google Scholar]
  51. Sardinha, T. B. , Kauffmann, C. , & Acunzo, C. M.
    (2014) A multi-dimensional analysis of register variation in Brazilian Portuguese. Corpora, 9(2), 239–271. 10.3366/cor.2014.0059
    https://doi.org/10.3366/cor.2014.0059 [Google Scholar]
  52. Sardinha, T. B. , & Pinto, M. V.
    (Eds.) (2014) Multi-dimensional analysis, 25 years on: A tribute to Douglas Biber (Vol.60). John Benjamins Publishing Company. 10.1075/scl.60
    https://doi.org/10.1075/scl.60 [Google Scholar]
  53. Štajner, S. , & Saggion, H.
    (2018), August). Data-Driven Text Simplification. InProceedings of the 27th International Conference on Computational Linguistics: Tutorial Abstracts (pp.19–23).
    [Google Scholar]
  54. Vega, B. , Feng, S. , Lehman, B. , Graesser, A. , & D’Mello, S.
    (2013), July). Reading into the text: Investigating the influence of text complexity on cognitive engagement. InEducational Data Mining 2013.
    [Google Scholar]
  55. Wray, D. , & Janan, D.
    (2013) Readability revisited? The implications of text complexity Published in The Curriculum Journal, 2013. 10.1080/09585176.2013.828631
    https://doi.org/10.1080/09585176.2013.828631 [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error